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PREFACE: 


As a result of improvements in fabrication technology, Large Scale Integrated (LSI) 
electronic circuitry has become so dense that a single silicon LSI chip may contain tens of 
thousands of transistors. Many LSI chips, such as microprocessors, now consist of multiple, 
complex subsystems, and thus are really integrated systems rather than integrated circuits. 

What we have seen so far is only the beginning. Achievable circuit density approximately 
doubles with each passing year. How long can this continue, and how small can the 
transistor be made? From the physics we find that the linear dimensions of transistors can 
be reduced to less than 1/10 of those in current integrated systems before they cease to 
function as the sort of switching elements from which we can easily build digital systems. It 
will eventually be possible to fabricate chips with hundreds of times as many components as 
today’s. The transistors in such very large scale integrated (VLSI) systems will ultimately 
have linear dimensions smaller than the wave-length of visible light. New, non-optical, 
high-resolution lithographic techniques are now being developed by many firms to enable 
fabrication of such circuitry. 

The emerging VLSI presents a challenge not only to those involved in development of 
appropriate fabrication technology, but also to computer scientists and computer architects. 
The ways in which digital systems are structured, the procedures used to design them, the 
tradeoffs between hardware and software, and the methodology and metrics of analysis of 
algorithms will all be greatly affected by the maturation of electronic technology towards its 
maximum density. We believe this will be an important area of activity for computer 
science on through the 1980’s. 

Until recently, the design of integrated electronic circuitry has been largely the province of 
circuit and logic designers working within semiconductor firms. Computer system architects 
have traditionally built systems from standard integrated circuits designed and manufactured 
by these firms, but haven't often participated in the specification and design of these 
integrated circuits. Electrical Engineering and Computer Science (EE/CS) curricula have 
reflected this tradition, with courses in device physics and integrated circuit design aimed at 
and generally taken by different students than those interested in digital system architecture 
and computer science. 

This text is written to fill the current gap by providing students of computer science and 
electrical engineering with an introduction to integrated system architecture and design. 
Combined with individual study in related research areas and participation in actual system 
design projects, the text could serve as a basis for a graduate course sequence in integrated 
systems. Portions could be used for an undergraduate text on the subject, or to augment a 
graduate course on computer architecture. It could also be used to extend, in the system 
direction, a classical electrical engineering course in integrated circuits. We assume the 
reader's background contains the equivalent of an introductory course sequence in computer 
science, and introductory courses in electronic circuits and digital design. 



Up till now there have been major obstacles in the path of those attempting to gain an 
overall understanding of integrated systems. Integrated electronics, developing in a heatedly 
competitive business environment, has proliferated into a large array of different device 
technologies, circuit design families, logic design techniques, maskmaking and wafer 
fabrication techniques, etc. The technologies have sprung up from the grass roots of 
"Silicon Valley" in California. Most participants in the industry have of necessity 
concentrated on rather narrow specialties. Texts on the subject have tended to give detailed 
accounts of some very narrow horizontal segment of the overall subject, such as device 
physics or circuit design. 

We have chosen instead to provide the minimum of basic information about devices, 
circuits, fabrication technology, logic design techniques, and system architecture, which is 
sufficient to enable the reader to fully span the entire range of abstractions, from the 
underlying physics to complete VLSI digital computer systems. A rather small set of 
carefully selected key concepts is all that is necessary for this purpose. We believe that only 
by carrying along the least amount of unnecessary mental baggage at each step in such a 
study, will the student emerge with a good overall understanding of the subject. Once this 
range of abstractions is spanned, the sequence of concepts can then be mapped into the 
reader's own space of application and technology. 

Another major obstacle has been the high rate of change of integrated electronics. The 
uninitiated could easily get the feeling that much energy could be invested in learning 
material which becomes obsolete as rapidly as it is assimilated. The major driving 
mechanism in all this change is the continual improvement in fabrication technology. This 
evolutionary process results in the feasibility of manufacturing smaller and smaller devices 
as time passes. By including the effects of scaling down device dimensions as an essential 
ingredient of all topics in this text, many of the important changes of the architectural 
parameters of the technology are predicted, expected, and indeed hoped for. 

The key concepts of this text are illustrated by way of specific examples. In any given 
technology, form follows function in a particular way. The skill of mapping function into 
form, once acquired, can be readily applied to any technology. Because of its density, speed, 
topological properties, and general availability of wafer fabrication, nMOS has been chosen 
as the technology in which examples are implemented. 

An atmosphere of excitement and anticipation pervades this field. Workers from many 
backgrounds, computer scientists, electrical engineers, and physicists, are collaborating on a 
common problem area which has not yet become classical. The territory is vast, and largely 
unexplored. The rewards are great for those who simply press forward. 


Carver A. Mead 
Pasadena, California 


Lynn A. Conway 
Belmont, California 



Background 


Prior to the commercial publication of this textbook, this limited printing of the first five 
chapters is being distributed to a selected group of universities for use as course notes in 
graduate courses on integrated systems. Copies are also being distributed to the industrial 
participants in the Caltech Silicon Structures Project, and to selected individuals in 
universities and in industry for their review. 

The authors welcome any and all comments and suggestions from readers. We are especially 
interested in hearing of the experiences of those teaching from the text. Notifications and 
corrections of errors, ideas for improvements in the tutorial techniques used, and suggestions 
of instructive problems for each chapter would be appreciated. 

While the material in this text is presented in a particular order, it need not be read in that 
order. Each chapter presents material from a distinct level in the hierarchy of disciplines 
involved in integrated systems. We recommend that the reader start in the chapter where he 
or she is most knowledgeable, and read until information is required from an adjacent area 
described in some other chapter. By using this algorithm and consulting the reading 
references where necessary, the reader can gradually work through the primary material of 
all chapters. Although much of the material in this text is previously unpublished, it 
contains only fundamental concepts. However, these concepts cover a wide range of 
disciplines, and are easily conceptualized only after the overall context becomes clear. 

This text has its origins in a series of courses in integrated circuit design given by Carver 
Mead at Caltech, beginning in 1970. Starting in 1971, students in these courses designed and 
debugged their own integrated circuits. The students undertook increasingly complex system 
designs, using only rather simple implementation aids. The structured design methodology 
presented in this text evolved from this milieu. 

A separate Computer Science department was formed at Caltech in 1976, with integrated 
systems as a focus. An early association was formed with systems architects in industry. 
Interaction of Caltech students and faculty with industrial researchers resulted in the idea of 
a joint Caltech-industry cooperative program. The initial industrial participants in this 
program, known as the Caltech Silicon Structures Project, are Intel, DEC, IBM, Hewlett- 
Packard, and Xerox. 

Work on this text began in August 1977. The first three chapters were used as course notes 
during the fall of 1977, in courses given by Carver Mead at Caltech and by Carlo Sequin at 
U. C. Berkeley. This present printing is being used during the spring of 1978 in courses 
given by Ivan Sutherland at Caltech, by Robert Sproull at Carnegie-Mellon University, and 
by Fred Rosenberger at Washington University, St. Louis. 
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Sections: 

The MOS Transistor - - - The Basic Inverter - - - Inverter Delay - - - Parasitic 
Effects - - - Driving Large Capacitive Loads - - - Space vs Lime - - - Basic NAND and 
NOR Logic Circuits - - - Super Buffers - - - A Closer Look at the Electrical 
Parameters - - - Depletion Mode vs F.nhancement Mode Pullups - - - Delays in Another 
Form of Logic Circuitry - - - Pullup/Pulldown Ratios for Inverting Logic Coupled by Pass 
Transistors - - - transit limes and Clock Periods - - - Properties of Cross Coupled 
Circuits-Fffects of Scaling Down the Dimensions of MOS Circuits and Systems 


In this chapter we begin with a discussion of the basic properties of the n-channel, metal- 
oxide-semiconductor (MOS), field effect transistor (FET). We then describe and analyze a 
number of circuits composed of interconnected MOS field effect transistors. The circuits 
described are typical of those we will commonly use in the design of integrated systems. 
The analysis, though highly condensed, is conceptually correct and provides a basis for the 
solution of most system problems typically encountered. 

Integrated systems in MOS technology contain three levels of conducting material separated 
by intervening layers of insulating material. Proceeding from top to bottom, these levels are 
termed the metal, the polysilicon, and the diffusion levels respectively. Patterns for paths 
on these three levels, and the locations of contact cuts through the insulating material to 
connect certain points between levels, are transferred into the levels during the fabrication 
process from masks similar to photographic negatives. The details of the fabrication process 
will be discussed in chapter 2. 

In the absence of contact cuts through the insulating material, paths on the metal level may 
cross over paths on the polysilicon or diffusion levels with no significant functional effect. 
However, wherever a path on the polysilicon level crosses a path on the diffusion level, a 
transistor is created. Such a transistor has the characteristics of a simple switch, with a 
voltage on the polysilicon level path controlling the flow of current in the diffusion level 
path. Circuits composed of such transistors, interconnected by patterned paths on the three 
levels, form our basic building blocks. With these basic circuits, we will architect 
integrated systems, to be fabricated on the surface of monolithic crystalline chips of silicon. 
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The MOS Transistor 

An MOS transistor will be produced on the integrated system chip wherever a polysilicon 
path crosses a diffusion path, as shown in figure la. The electrical symbol used to represent 
the MOS transistor in our circuit diagrams is shown in figure lb, along with symbols and 
polarities of certain voltages of interest. Note that the source and drain terminals of the 
device are physically symmetrical. For the n-channel MOSFETs, these terminal labels are 
assigned such that V ds is normally positive. A more detailed view of the rectangular region 
called the gate, where the polysilicon (poly) crosses the diffusion, is given in figure 1c. 
During fabrication the diffusion paths are formed after the poly paths are formed, as 
explained more fully in chapter 2. The poly gate and thin layer of oxide under masks the 
region under the gate during diffusion, thus interrupting the diffusion path under the gate. 
Therefore, there is no direct diffusion connection between the source and drain terminals of 
the transistor. Notice in this discussion that metal, poly, and diffusion paths all conduct 
electricity well enough to be considered "wires" until further notice. 

In the absence of any charge on the gate, the drain to source path through the transistor is 
like an open switch. The gate, separated from the substrate by the layer of thin oxide, 
forms a capacitor. If sufficient positive charge is placed on the gate so that V gs exceeds a 
threshold voltage V^, electrons will be attracted to the region under the gate to form a 
conducting path between drain and source. Most of the transistors we will use in our 
systems have threshold voltages greater than zero. These are called enhancement mode 
MOSFETs, and their threshold voltage typically equals ~ 0.2(VDD), where VDD is the 
positive supply voltage for the particular technology. 

The basic operation performed by the MOS transistor is to use charge on its gate to control 
the movement of negative charge between source and drain through the channel under the 
gate. The current from source to drain equals the charge induced in the channel divided by 
the transit time or average time required for an electron to move from source to drain. The 
transit time itself is the distance the electron has to move divided by its average velocity. In 
semiconductors under normal conditions, the velocity is proportional to the electric field 
driving the electrons. The relationship between drain to source current lj s , drain to source 
voltage V ds and gale to source voltage V is sketched in figure Id. For small V ds , the 
transit time r is given by equation 1. 
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Transit time: 

t = L/velocity = L/fiE = L^/[/iV ( j s ] [eq.l] 

The proportionality constant ja is called the mobility of the charge carriers, in this case 
electrons, under the influence of an electric field in the conducting material of the channel 
region. It is a velocity per unit electric field (cinV volt-sec). We shall see that the transit 
time is the fundamental time unit of the entire integrated system. 

The amount of negative charge in transit is just the gate capacitance times the voltage on the 
gate in excess of the threshold voltage. The capacitance of two parallel conductors of area 
A, separated by insulating material of thickness D, equals eA/D. The proportionality 
constant t is called the permittivity of the insulating material, and has a simple 
interpretation. It is the capacitance of parallel conductors of area A = 1 cm , separated by a 
thickness D = 1 cm of the insulator material, and is in the units farad/cm. Therefore, the 
gate capacitance equals eWL/D. Thus the charge in transit is given by eq. 2, and the 
current is given by eq. 3. 

Charge in transit: 

Q = - C g (V gs - V th ) = - £WL(V gs - V th ) [eq.2] 

D 


Current: 

! ds " " 'sd = ‘ charge in tran sit = jxeW( V gs - V lh )(V ( j s ) [eq.3] 
transit time LD 

Note that for small V ( j s , the drain current is proportional to the source-drain voltage and 
also to the gate voltage above threshold. Any device with a current through it proportional 
to the voltage across it, may be viewed as a resistor, and in the case of an MOS device with 
low drain to source voltage, the resistance is controlled by the gate voltage as given in eq. 3a. 


Vds/'ds * R = L 2 /[fC g (V gs - V th )] [eq.3a] 

In both equations 2 and 3a, C g is the gate to channel capacitance of the turned on transistor. 
In the simple case where this transistor is driving the gate of another one identical to it, the 
time response of the system will be an exponential with a time constant RC g . given in 
equation 4. This time constant is identical to the transit time t given in equation 1. 
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RC g = l. 2 /[fi(V gs - V th )] = r [eq.4] 

Although the above equations are greatly simplified, they provide sufficient information to 
make many design decisions which we will face, and also give us insight into the scaling of 
devices to smaller sizes. In particular, the transit time t can be viewed as the basic time 
unit of any system we shall build in the integrated technology. In almost all situations, the 
fastest operation which we can perform is to transfer a signal front the gate of one MOS 
transistor onto the gate of another. The transit time is the minimum time in which a charge 
placed on the gate of one transistor results in the transfer of a similar charge through that 
transistor's channel onto the gate of a subsequent transistor. For example, to transfer a 
charge from one transistor onto two transistors identical to it requires a minimum of two 
transit times. Thus, the transit time of the basic transistor in an integrated system can be 
viewed as the unit of time in which all other times in the system are scaled. Although it is 
a somewhat optimistic approximation, we will use t as the primary time metric in 
calculating the delay through elementary inverting logic stages. More accurate predictions of 
circuit behavior can be produced using any one of a number of available circuit simulation 
programs.^ 

As Vfe is increased, not all of the drain to source voltage is available for reducing the 
transit time. Drain voltage in excess of one threshold below' the gate voltage creates a short 
region of high electric field adjacent to the drain which the carriers cross very quickly. The 
electric field in the major portion of the channel from the source up to this region is 
proportional to Vg S - V^, as shown in figure le. For > (Vg S - V t ^), the drain current 
becomes independent of Further increases in neither increase nor decrease the 
transit time. This range of values is known as saturation. 

In saturation: I^ s = jxeW( Vg S - V t ^)^ [eq.5] 

2LD 

With the exception of the factor of 2 in the denominator, this equation is similar to 
equation 3, with the factor in that equation replaced by its maximum effective value, 

V gs - V[h • The factor of 2 in equation 5 arises from the non-uniformity of the electric 

® 1 R 4 

field in the channel region when in saturation 1 ’ . 
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The Basic Inverter 

The first logic circuit we will describe is the basic digital inverter. Analysis of this circuit is 
then extended to analysis of basic NAND and NOR logic gates. The inverter’s logic 
function is to produce an output which is the complement of its input. When describing 
the logic function of circuits in integrated systems, we assign the value logic-1 to voltages 
equaling or exceeding some defined logic threshold voltage, and logic-0 to voltages less than 
this threshold voltage. 

Were there an efficient way to implement resistors in the MOS technology, we could build a 
basic digital inverter circuit using the configuration of figure 2a. Here, if the inverter 
input voltage Vj n is less than the transistor threshold voltage then the transistor is 
switched off, and V out is "pulled-up" to the positive supply voltage VDD. In this case the 
output is the complement of the input. If Vj p is greater than V th? the transistor is 
switched on and current flows from the VDD supply through the resistor R to GND. If R 
were sufficiently large, V QU ^ could be "pulled-down" well below V^, thus again 
complementing the input. However, the resistance per unit length of minimum width lines 
of various available conducting elements is far less than the effective resistance of the 
switched on MOSFET. Implementing a sufficiently large inverter pullup using resistive 
lines would require a very large area compared to that occupied by the transistor itself. 

To circumvent this problem a depletion mode MOSFET is used as a pullup for the basic 
inverter circuit, symbolized and configured as shown in figure 2b. In contrast to the usual 
enhancement mode transistor, the depletion mode transistor has a threshold voltage, V^, 
that is less than zero. During fabrication, one of the masks is used to select any desired 
subset of transistors in the integrated system for processing as depletion mode transistors. 
For a depletion mode transistor to turn off, it requires a voltage on its gate relative to its 
source that is more negative than V dep . But the depletion mode pullup transistor’s gate is 
connected to its source, and thus it is always turned on. Hence, when the enhancement 
mode transistor is turned off, for example by connecting zero voltage to its gate, the output 
of the inverter will be equal to VDD. We will find that for reasonable ratios of the gate 
geometries of the two transistors, input voltages above a defined logic threshold voltage, 
Vj nv . will produce output voltages below that logic threshold voltage, and vice versa. 
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The top view of the layout of an inverter on the silicon surface is sketched in figure 2c. It 
consists of two polysilicon regions overhanging a path in the diffusion level which runs 
between VDD and GND. This arrangement forms the two MOS transistors of the inverter. 
The inverter input A is connected to the poly forming the gate of the low'er of the two 
transistors. The pullup is formed by connecting the gate of the upper transistor to its source. 
The fabrication details of such connections are described in chapter 2. The output of the 
inverter is shown emerging on the diffusion level, from between the drain of the pulldown 
and the source of the pullup. The pullup is a depletion mode transistor, and it is usually 
several times longer than the pulldown, to achieve the proper inverter logic threshold. 

Figures 3a and 3b show the characteristics of a typical pair of MOS transistors used to 
implement an inverter. The relative locations of the saturation regions of the pullup and 
pulldown differ in these characteristics, due to the difference in their threshold voltages. 

We can use a graphical construct to determine the actual transfer characteristic, V om vs 
Vj n , of the inverter circuit. From figure 2b we see that the V ds (enh) of the enhancement 
mode transistor equals VDD minus V ds (dep) of the depletion mode transistor. Also, 
V ds (enh) equals V out . In a steady state and with no current drawn from the output, the l ds 
of the two transistors are equal. Since the pullup has its gate connected to its source, only 
one of its characteristic curves is relevant, namely the one for Vg S (dep) = 0. Taking these 
facts into account, we begin the graphical solution (fig. 3c) by superimposing plots of 
l ds (enh) vs V ds (enh), and the one plot of l ds (dep) vs C VDD ‘ V ds (dep)]. Since the currents 
in both transistors must be equal, the intersections of these sets of curves yields V ds (enh) = 
V om versus V gs (enh) = Vj n . The resulting transfer characteristic is plotted in figure 3d. 

Studying figures 3c and 3d, consider the effect of starting with Vj n = 0 and then gradually 
increasing V (n towards VDD. While the input voltage is below' the threshold of the 
pulldown transistor, no current flows in that transistor, the output voltage is constant at 
VDD, and the drain to source voltage across the pullup transistor is equal to zero. When 
Vj n is first increased above the enhancement mode threshold, current begins to flow in the 
pulldown transistor. The output voltage decreases slowly as the input voltage is first 
increased above V t u. Subsequent increases in the input voltage rapidly lower the pulldown’s 
drain to source voltage, until the point is reached where the pulldown leaves its saturation 
region and becomes resistive. Then as Vj n continues to increase, the output voltage 
asymptotically approaches zero. The input voltage at which V m = V 0l|l is known as the 
logic threshold voltage V jnv . Figure 3d also shows the effect of changes in the transistor 
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length to width ratios on the transfer characteristics and on the logic threshold voltage. The 
resistive impedance of the MOS transistor is proportional to the length to width ratio Z of 
its gate region. Using the subscript pu for the pullup transistor and pd for the pulldown 
transistor: If Z pu = L pu /W pu is increased relative to Z pd = l pc j /W pd’ lhen V inv decreases, 
and vice-versa. The gain, or negative slope of the transfer characteristic near Vj nv , 
increases as Z pu ZZ pd increases. The gain G must be substantially greater than unity for 
digital circuits to function properly. 


Inverter Logic Threshold Voltage 

The most fundamental property of the basic inverter circuit is its logic threshold voltage, 
Vj v . The logic threshold here is not the same as V th of the enhancement mode transistor, 
but is that voltage on the input of the enhancement mode transistor which causes an equal 
output voltage. If V m is increased above this logic threshold, v out falls below it, and if 
Vj n is decreased below Vj nv , V oul rises above it. The following simple analysis assumes 
that both pullup and pulldown are in saturation, so that equation 3 applies. Usually the 
pullup is not quite in saturation, but the following is still nearly correct. Vj nv is 
approximately that input voltage which would cause saturation current through the pulldown 
transistor to be equal to saturation current through the pullup transistor. Referring to eq.5, 
we find the condition for equality of the two currents given in eq.6. 


Currents equal when: 

Wpd (V inv - V lh ) 2 = WEU_(-V dep ) 2 , [eq.6] 

Lpd Lpu 

or thus when: 

V inv * v th - WCZpu/Zpd]* [eq.6a] 

Here we note that the current through the depletion mode transistor is dependent only on its 
geometry and threshold voltage V dep , since its V gs = 0. Note that V jnv is dependent upon 
the thresholds of both the enhancement and depletion mode transistors, and also the square 
root of the ratio of the Z = l./W of the enhancement mode transistor to that of the 
depletion mode transistor. 
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Tradeoffs are possible between these threshold voltages and the areas and current driving 
capability of transistors in the system’s inverters. To maximize (V gs * V th ) and increase the 
pulldowns’ current driving capability for a given area, should be as low as possible. 
However, if V th is too low, inverter outputs won’t be driveable below V L j v and inverters 
won’t be able to turn off transistors used as simple switches. The original choice of ~ 
0.2VDD is a reasonable compromise here. 

Similarly, to maximize the current driving capability of pullups of given area, we might set 
the system's V dep as far negative as possible. However, eq. 4a shows that for chosen Vj nv 
and V lh , decreasing V dep requires an increase in L pu /W pu , typically leading to an increase 
in pullup area. The compromise made in this case is usually as follows. The negative 
threshold of depletion mode transistors is set during fabrication such that with gate tied to 
source, they turn on approximately as strongly as would an enhancement mode transistor 
with VDD connected to its gate and its source grounded. In other words, depletion mode 
and enhancement mode transistors of equal gate dimensions would have equal drain to 
source currents under those conditions. Applying eq.5 in those conditions we find that: 

(- v dep> 2 ~ < VDD - V th> 2 - 

Therefore, -V dep ~ (VDD - V th ), and V dep ~ -0.8VDD. While adjustments in the 
details of this choice are often made in the interest of optimization of processes for a 
particular product, we will assume here this approximate equality of turn-on voltages of the 
two transistor types for the sake of simplicity. Substituting this choice of V dep into eq.6a, 
we find that for V lh small compared to VDD: 

v jnv ~ VDD/[Zpu/Zpd]'^ [eq.7] 

In general it is desirable that the margins around the inverter threshold be approximately 
equal, i.e., that the inverter threshold, V. , lie approximately midway between VDD and 
ground. We see from eq.7 that this criterion is met by a ratio of pullup Z to pulldown Z of 
approximately 4:1. We will see later that the choice of V dep ~ VDD - V t ^, producing a 
ratio of 4:1 here, will lead to a balancing of performances in certain other important 
circuits. 
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Inverter Delay 

A minimum requirement for an inverter is that it drive another identical to itself. Let us 
analyze the delay through a string of inverters of identical dimensions since this is the 
simplest case in which we can estimate performance. Inverters connected in this way are 
shown in Fig. 4a. We define the the inverter ratio k as the ratio of Z of the pullups to Z of 
the pulldowns. Note that we will sometimes use the alternative pullup symbol shown 
("resistor with gate"), to clarify its functional purpose. 

Let us assume that prior to t = 0, the voltage at the input of the first inverter is zero, and 
hence, the voltage output of the second inverter will be low. At time t=0, let us place a 
voltage equal to VDD on the input of the first inverter and follow the sequence of events 
which follows. The output of the first inverter, which leads to the gate of the second 
inverter, will initially be at VDD. Within approximately one transit time, the pulldown 
transistor of the first inverter will remove from this node an amount of charge equal to 
VDD times the gate capacitance of the pulldown of the second inverter. The pullup 
transistor of the second inverter is now faced with the task of supplying a similar charge to 
the gate of the third inverter, to raise it to VDD. Since it can supply at most only 1/k'th of 
the current that can be supplied by the pulldown transistor, the delay in the second inverter 
stage is approximately k times that of the first. 

It is thus convenient to speak of the inverter pair delay which includes the delay for one 
lowgoing transition and one highgoing transition. This inverter pair delay is approximately 
(1+k) times the transit time, as shown in figure 4a. The fact that the rising transition is 
slower than the falling transition by approximately the inverter transistors' geometry ratios 
is an inherent characteristic of any ratio type logic. It is not true of all logic families. For 
example, in families such as complementary MOS where there are both pMOS and nMOS 
devices on the same silicon chip and both types operate strictly as pulldown enhancement 
mode devices, any delay asymmetry is a function of the difference in mobilities of the p 
and n type charge carriers rather than of the transistor geometrical ratios. 

Fig. 4b shows an inverter driving the inputs of several other inverters. In this case, for a 
fanout factor f, it is clear that in either the pullup or pulldown direction, the active device 
must supply f times as much charge as it did in the case of driving a single input. In this 
case, the delay both in the up and downgoing directions is increased by approximately the 
factor f. In the case of the downgoing transition, the delay is approximately f times the 
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transit time of the pulldown transistor, and in the case of the upgoing transition, the delay 
is approximately the inverter ratio k times the fanout factor times the pulldown transit time. 

In the discussions of transit lime given earlier, it was assumed that both the depletion mode 
pull up device and the enhancement mode pulldown device were operating in the resistive 
region. It was also assumed that all capacitances were constant, and not a function of 
voltage. These conditions are not strictly met in the technology w'e are discussing. Delay 
calculations given in this text are based on a "switching moder where individual stages 
spend a small fraction of their time in the mid-range of voltages around V inv . This 
assumption introduces a small error of the order of 1/G. Because of these and other second 
order effects, the switching times actually observed vary somewhat from those derived. 

Parasitic Effects 

In integrated systems, capacitances of circuit nodes are due not only to the capacitance of 
gates connected to the nodes, but also include capacitances to ground of signal paths 
connected to the nodes and other stray capacitances. These other capacitances, sometimes 
called parasitic or stray capacitances, are not negligible. While gate capacitances are 
typically an order of magnitude greater per unit area than the capacitances of the signal 
paths, the signal paths are often much larger in area than the associated gate regions. 
Therefore, a substantial fraction of the delay encountered may be accounted for by stray 
capacitance rather than by the inherent properties of the active transistors. In the simplest 
case where the capacitance of a node is increased by the presence of parasitic area attached 
to the node, the delays can be accounted for by simply increasing the transit time by the 
ratio of the total capacitance to that of the gale of the transistor being driven. Time is 
required to supply charge not only to the gate itself but also to the parasitic capacitance. 

There is one type of parasitic, however, which is not accounted for so simply. All MOS 
transistors have a parasitic capacitance between the drain edge of the gate and the drain 
node. This effect is shown schematically in figure 4c. In an inverter string, this capacitance 
wall be charged in one direction for one polarity of input, and in the opposite direction for 
the opposite polarity input. Thus, on a gross scale its effect on the system is twice that of 
an equivalent parasitic capacitance to ground. Therefore, gate to drain capacitances should 
be approximately doubled, and added to the gate capacitance Cg and the stray capacitances, 
to account for the total capacitance of the node and thus for the effective delay time of the 
inverter. The effective inverter pair delay then = t(1 +k)C lola j/Cg. 
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Driving Large Capacitive Loads 

As we have seen, the delay per inverter stage is multiplied by a fanout factor. The overall 
performance of a system may be seriously degraded if it contains any large fanouts, where 
one circuit within the system is required to drive a large capacitive load. As we shall see, 
this situation often occurs in the case of control drivers which are required to drive a large 
number of inputs to memory cells or logic function blocks. A similar and even more 
serious problem is driving wires which go off the silicon chip to other chips or 
input/output devices. In such cases the ratio of the capacitance which must be driven to the 
inherent capacitance of a gate circuit on the chip is often many orders of magnitude, 
causing a serious delay and a degradation of system performance. 

Consider how we may drive a capacitive load C L in the minimum possible time given that 
we are starting with a signal on the gate of an MOS transistor of capacitance C g . Define the 
ratio of the load capacitance to the gate capacitance, C^/Cg, as Y. It seems intuitively clear 
that the optimum way to drive a large capacitance is to use our elementary inverter to drive 
a larger inverter and that larger inverter to drive a still larger inverter until at some point 
the larger inverter is able to drive the load capacitance directly. Using an argument similar 
to the fanout argument it is clear that for one inverter to drive another inverter, where the 
second is larger in size by a factor of f, results in a delay f times the inherent inverter delay, 
r. If N such stages are used, each larger than the previous by a factor f, then the total delay 
of the inverter chain is Nfr, where f N equals Y.. Note that if we use a large factor f, we can 
get by with few stages, but each stage will have a long delay. If we use a smaller factor f, we 
can shorten the delay of each stage, but are required to use more stages. What value of N 
minimizes the overall delay for a given Y? We compute this value as follow's: 

Since f N = Y, ln(Y) = N ln(f) 

Delay of one stage = fr , 

Thus total delay is = Nfr = ln(Y)[ f/ln(f) ]t [eq.8] 

Notice that the delay is always proportional to ln(Y), a result of the exponential growth in 
successive stages of the driver. The multiplicative factor. f/ln(f), is plotted its a function of 
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f in figure 5, normalized to its minimum value (e). Total delay is minimized when each 
stage is larger than the previous one by a factor of e, the base of natural logarithms. 
Minimum total delay is the elementary inverter delay t times e times the natural logarithm 
of the ratio of the load capacitance to the elementary inverter capacitance. 

Min. total delay - re[ln(CjyCg)] [eq.9] 

Minimum delay through the driver is seldom the only design criterion. The relative time 
penalty introduced by the choice of other values of f can be read directly from figure 5. 


Space vs Time 

From the results of the sections on inverter delay, parasitic effects, and driving large 
capacitances, we see that areas and distances on the silicon surface trade off against delay 
times. For an inverter to drive another inverter some distance away, it must charge not only 
the gate capacitance of the succeeding inverter but also the capacitance to ground of the 
signal path connecting the two. Increasing the distance between the two inverters will 
therefore increase the inverter pair delay. This effect can be counterbalanced by increasing 
the area of the first inverter, so as to reduce the ratio of the load capacitance to the gate 
capacitance of the first inverter. But the delay of some previous driving stage is then 
increased. There is no way to get around the fact that transporting a signal from one node 
to another some distance away requires charging or discharging capacitance, and therefore 
takes time. Note that this is not a velocity of light limitation as is often the case outside 
the chip. The times are typically several orders of magnitude longer than those required for 
light to traverse the distances involved. To minimize both the time and space required to 
implement system functions, we will tend to use the smallest possible circuits and locate 
them in ways which tend to minimize the interconnection distances. 

The results of a previous section can be used here to illustrate another interesting space vs 
time effect. Suppose that the minimum size transistors of an integrated system have a 
transit time r and gate capacitance C g . A minimum size transistor within the system 
produces a signal which is then passed through successively larger inverting logic stages and 
eventually drives a large capacitance with minimum total delay equal to t ninr With the 
passage of lime, fabrication technology improves and we replace ihe system with one in 
which the circuits are all scaled down in size by dividing by a factor a (the motivation for 
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this is clear: the new system may contain a 1 as many circuits). As described in a later 
section, we will find that the transit times of the smallest circuits will now be r - t/ a, and 
their gate capacitance will be C g ' = C g /a. Referring to equation 5a., we find that the new 
minimum total delay, t m j n ' to drive C L scales less favorably: 

^min = t mint^^ a ^ n ^ a ^ 

Therefore, as the inverters scale down and r gets smaller, more inverting logic stages are 
required to obtain the minimum offehip delay. Thus the relative delay to the outside world 
becomes larger. However, the absolute delay becomes smaller. 


Basic NANI) and NOR Logic Circuits 

NAND and NOR logic circuits may be constructed in integrated systems as simple 
expansions of the basic inverter circuit. The analysis of the behavior of these circuits, 
including their logic threshold voltages, transistor geometry ratios and time delays, is also a 
direct extension of the analysis of the basic inverter. 

The circuit layout diagram of a two input NAND gate is shown in figure 6a. The layout is 
that of a basic inverter with an additional enhancement mode transistor in series with the 
pulldown transistor. NAND gates with more inputs may be constructed by simply adding 
more transistors in series with the pulldown path. The electrical circuit diagram, truth table 
and logic symbol for the tw'o input NAND gate are shown in figure 6b. If either of the 
inputs A or B is a logic-0, the pulldown path is open and the output will be high, and 
therefore a logic-1. For the output to be driven low, to logic-0, both inputs must be high, at 
logic-1. The logic threshold voltage of this NAND gate is calculated in a similar manner to 
that of the basic inverter, except equation 7 is rewritten with the length of the pulldowns 
replaced with the sum of the lengths of the two pulldowns (assuming their wddths are equal) 
as follows: 


v thNAND ~ Vl)D/[(l.pii/Wpu)/((I.pd a +Lpd b )/Wpd)] 1 ' 4 

This equation indicates that as pulldowns are added in series to form NAND gale inputs, 
the pull up length must he enlarged to hold the logic threshold voltage constant. 
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The logic threshold voltage of an n-input NAND gate, assuming all the pulldowns have 
equal geometries, is: 

v thNAND ~ VDD/[(Lpu/Wpu)/(nLpd/Wpd)] 1/z 

As inputs are added and pull up length is increased, the delay time of the NAND gate is also 
correspondingly increased, both for rising and falling transitions. 

t nand ~ nT inv 

The circuit layout diagram of a two input NOR gate is shown in figure 6c. The layout is 
that of a basic inverter with an additional enhancement mode transistor in parallel with the 
pulldown transistor. Additional inputs may be constructed by simply placing more 
transistors in parallel with the pulldown path. The circuit diagram, truth table and logic 
symbol for the two input NOR gate are shown in figure 6d. If either of the inputs A or B 
is a logic-1, the pulldown path to ground is closed and the output will be low, and therefore 
a logic-0. For the output to be driven high, to logic-1, both inputs must be low, at logic-0. 
If one of its inputs is kept at logic-0, and the other swings between logic-0 and logic-1, the 
logic threshold voltage of the NOR gate is the same as that of a basic inverter of equal 
pullup to pulldown ratio. If this ratio were 4:1 to provide equal margins, then V t ^ N0R ~ 
VDD/2 with only one input active. However, if both pulldowns had equal geometries, and if 
both inputs were to move together between logic-0 and logic-1, VthNOR w r ould be reduced 
to ~ VDD/(8) 1/2 . The logic threshold voltage of an n-input NOR circuit decreases as a 
function of the number of active inputs (inputs moving together from logic-0 to logic-1). 
The delay time of the NOR gate with one input active is the same as that of an inverter of 
equal transistor geometries, except for added stray capacitance. Its delay time for failing 
transitions is decreased as more of its inputs are active. 
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Super Buffers 

As we have noted, ratio type logic suffers from an asymmetry in its ability to drive 
capacitive loads. This asymmetry results from the fact that the pullup transistor has of 
necessity less driving capability than the pulldown transistor. There are, however, methods 
for avoiding this asymmetry. Shown in figures 7a and 7b are circuits for inverting and 
non-inverting drivers which are approximately symmetrical in their capability of sourcing 
or sinking charge into a capacitive load. Drivers of this type are called super buffers. 

Both types of super buffer are built using a depletion mode pullup transistor and an 
enhancement mode pulldown transistor, with a ratio of Z’s of approximately 4:1 as in the 
basic inverter. However, the gate of the pullup transistor, rather than being tied to its 
source, is tied to a signal which is the complement of that driving the pulldown transistor. 
When the pulldown transistor gate is at a high voltage, the pullup transistor gate will be 
approximately at ground, and the current through the super buffer will be similar to that 
through a standard inverter of the same size. However, when the gate of the pulldown 
transistor is put to zero, the gate of the pullup transistor will go rapidly to VDD since it is 
the only load on the output of the previous inverter, and the depletion mode transistor will 
be turned on at approximately twice the drive which it would experience if its gate were tied 
to its source. Since the current from a device in saturation goes approximately as the square 
of the gate voltage, the current sourcing capability of a super buffer is approximately four 
times that of a standard inverter. Hence, the current sourcing capability of its pullups is 
approximately equal to the current sinking capability of its pulldowns, and wave forms from 
super buffers driving capacitive loads are nearly symmetrical. 

The effective delay time, t, of super buffers is thus reduced to approximately the same value 
for highgoing and lowgoing wave forms. Needless to say, when large capacitive loads are to 
be driven, super buffers are universally used. The arguments used in the last section to 
determine how many stages are used to drive a large capacitive load from a small source 
apply directly to super buffers. For that reason we have not explicitly indicated an inverter 
raiio k in that section 
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A Closer Look at the Llcctrical Parameters 

Up to this point we have talked in very simple terms about the properties of the MOS 
transistors. They have a capacitance associated with their gate input and a transit time for 
electrons to move from the source to the drain. We have given simple expressions for the 
drain to source current. For very low V d$ , the MOS transistor's drain to source path acts as 
a resistor whose conductance is directly proportional to the gate voltage above threshold, as 
given in equation 3. For V ds larger than V gs - V th , the device acts as a current source, with 
a current proportional to (V gs - V t ^, as given in equation 5. As V £ j s passes through the 
intermediate range between these two extremes, there is a smooth transistion between the 
two types of behavior, as given in equation 10.^ 


I ds = Q/t = ,iCg[(V gs - V lh ) V ds - (V ds 2 )/2]/L 2 [eq.10] 

Figure 9a plots 1^ vs V^ s summarizing the various regions of MOS transistor operation. 

There is another electrical characteristic we may occasionally have to take into account. The 
threshold voltage of an MOS transistor is not a constant, but varies slightly as a function of 
the voltage between the source terminal of the transistor and the silicon substrate, which is 
usually at ground. This so called body effect is illustrated in figure 9b. If the source to 
bulk (substrate) voltage, V sb , equals zero, then is at its minimum value of 

approximately 0.2 VDD. As V st) is increased, V th increases slightly. For typical processes, 
reaches a maximum value of about 0.3 VDD for ~ VDD. Vj e p is similarly 
affected, ranging from about -0.8 VDD to -0.7 VDD as the V s ^ of a depletion mode 
MOSFFT is raised from zero to VDD volts. As shown in figure 9b, it is possible to insert a 
fixed bias voltage between the circuit ground and the substrate (rather than just connect 
them). Such a substrate bias provides an electrical mechanism for setting the threshold to 
an appropriate value. 



[Ch. 1Sect.2] 


plot of 

Vds = Vgs - Vth 


Ids 



(a) Cutoff Region; 

Vgs < Vth . Ids = 0 

(b) Saturation Region: 

Vgs >_ Vth , Vds sufficiently high so 
Vgd < Vth . (i.e.: Vds >. Vgs-Vth) 

MOSFET acts as current source , with 
Ids proportional to (Vgs - Vth/ 

(c) Resistive Region: 

Vgs >_ Vth , Vds sufficiently low so 
Vgd >. Vth . (i.e.: Vds < Vgs - Vth) 
MOSFET acts as resistor , with resistance 
inversely proportional to (Vgs - Vth) 


Fig. 9a. Summary of MOS Transistor Characteristics 


i Conductance 




Fig. 9b. The Body Effect 


(mos l >.press) 





fCli.I: MOS Devices and Circuits: Sect.2] <Comvay>mos2.vlsi January 27, 1978 1:46 PM 


Depletion Mode vs Fnliancenient Mode Fullups 

With its gale tied to VDD, an enhancement mode transistor will be on for all V^ s > V th , 
and thus can be used for a pullup device in inverting logic circuits, Farly MOS processes 
used pullup devices of exactly this type. 

In this section we will make a comparison of the rising transients of the two types of pullup 
circuits. As noted earlier, rising transients in ratio type logic are usually slower than falling 
transients, and thus rising transients generally have greater impact on system performance. 
In the simplest cases, this asymmetry in the transients results from the current sourcing 
capability of the pullup transistor being less than that of its pulldown counterpart. The 
simple intuitive time arguments given earlier are quite adequate for making estimates of 
system performance in most cases. However, there are situations in which the transient time 
may be much longer than a naive estimate would indicate. The rising transient of the 
enhancement mode pullup is one of these. 

A depletion mode pullup transistor feeding a capacitive load is shown schematically in 
figure 10a. Since Vg S > and V gcJ > V lh , the pullup transistor is in the resistive region. 
The final stages of the rising transient are given by the following exponential: 

V(i) = VDD[1 - e _t/ ( RC L)] 

Where for inverter ratio k, and pulldown transit time r and gate capacitance C g the 
time-constant of the rising transient is given by: 

RC l = krC L /C g 

A somewhat more complicated situation is presented by an enhancement mode transistor 
sourcing charge into a capacitive load. This situation is shown schematically in Fig. 10b. 
Note that since V gc j = 0, the transistor is in saturation whenever V gs > V^. The problem 
with sourcing charge from the enhancement mode transistor is that as the voltage at the 
output node gels closer and closer to one threshold below VDD, the amount of current 
provided by the enhancement mode transistor decreases rapidly. 
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The dependence of the enhancement mode puilup current upon the output voltage V is 
given in eq. 11. 


Q = - tWL [(VDD - V th ) - V] 
D 

r = 2L 2 //i[(VDD - V th ) - V] 


Ids = - O/t = ueW F(VDD - V th ) - V] 2 [eq.ll] 

2LD 

The fact that the puilup current decreases as the output voltage nears its maximum value 
causes the rising transient from such a circuit to be of qualitatively different form than that 
of a depletion mode puilup. Equating l ds = C L dV/dt with the expression in equation 11, 
and then solving for V(t), we find the rising voltage transient, for large t: 


V(t) = VDD - V th - - C L LD [eq.12] 

fl£ Wt 

Note that in this configuration, the threshold voltage Vth’ of the puilup is near its 
maximum value as V(t) rises towards VDD, due to the body effect. A comparison of the 
rising transients of the preceding two circuits, assuming the same load capacitance and the 
same puilup source current at zero output voltage, is shown in Fig. 10c. The rising transient 
for the depletion mode puilup transistor is crisp and converges rapidly towards VDD. 
However, the rising transient for the enhancement mode puilup transistor, while starting 
rapidly, lags far behind and within the expected time response of the system, never even 
comes close to one threshold below VDD. Even for very large t, V(t) < VDD - V^*. The 
practical effect of this property of enhancement mode transistors is that circuits designed to 
work from a voltage derived from the output of such a circuit should be designed with an 
inverter threshold V. v considerably low'er than that of circuits designed to work wdth the 
output of a depletion mode puilup circuit, in order to obtain equal inverter margins 
without sacrificing performance, we will normally use depletion mode pullups. 
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Delays in Another Form of Logic Circuitry 

enhancement mode transistors, when used in small numbers and driving small capacitive 
loads, may often be used as switches in circuits of simple topology to provide logic signal 
steering functions of much greater complexity than could be easily achieved in ratio type 
inverting logic. These circuits are reminiscent of relay switching logic, and transistors used 
in this way are referred to as ’’pass transistors" or "transmission gates". Example circuits 
using this type of design are given in Chapter 3. A particularly interesting example is the 
Manchester carry chain 43 ’' 5 , used for propagating carry signals in parallel adders. In each 
stage of the adder a carry propagate signal is derived from the two input variables to the 
adder, and if it is desired to propagate the carry, this propagate signal is applied to the gate 
of an enhancement mode pass transistor. The source of the transistor is carry-in to the 
present stage, and the drain of the transistor is carry-out to the next stage. In this way, a 
carry can be propagated from less to more significant stages of the adder without inserting a 
full inverter delay between stages. The circuit is shown schematically in Fig. 11a. 

The delay through such a circuit does not involve inverter delays but is of an entirely 
different sort. A voltage along the chain divides into across each pass transistor. Thus 
V (ls is usually low, and the pass transistors operate primarily in the resistive region. We can 
think of each transistor as a series resistance in the carry path, and a capacitance to ground 
formed by the gate to channel capacitance of each transistor, and the strays associated with 
the source, drain, and connections with the following stage. An abstraction of the electrical 
representation is shown in Fig. lib. The minimum value of R is the turned on resistance of 
each enhancement mode pass transistor, while the minimum value of C is the capacitance 
from gate to channel of the pass transistor. Strays will increase both values, especially that 
of C. The response at the node labelled V 2 with respect to time is given in eq. 13. In the 
limit as the number of sections in the network becomes large, eq. 13 reduces to the 
differential form shown in eq. 14 where R and C are now the resistance and capacitance per 
unit length, respectively. 


C dV 2 /dt = [(V 1 - V 2 ) - (V 2 - V 3 ]/R 


[eq.13] 


RC dV/dt = d 2 V/dx 2 


[eq. 14] 
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Equation 14 is the well-known diffusion equation, and while its solutions are complex, in 
general the time required for a transient to propagate a distance x in such a system is 
proportional to x^. One can see qualitatively that this might be so. Doubling the number of 
sections in such a network doubles both the resistance and the capacitance, and therefore 
causes the time required for the system to respond to increase by a factor of approximately 
four. The response of a system of n stages to a step function input is shown in Fig. 11c. 

If we add one more pass transistor to such a chain of n pass transistors, the added delay 
through the chain is small for small n, but very large for large n. Therefore, it is highly 
desirable to group the pass transistors used for steering, multiplexing, and carry-chain type 
logic into short sections and interpose inverting logic between these sections. This approach 
applied to the carry chain is shown in figure lid. The delay through a section of n pass 
transistors is proportional to RCn". Thus the total delay ~ RCn^ plus the delay through the 
inverter r jnv . The average delay per stage is given in eq. 15. To minimize the delay per 
stage, chose n such that the delay through n pass transistors equals the inverter delay: 

Total delay ~ RCn^ + Tj nv 

Average delay/stage ~ RCn + Tj nv /n [eq.15] 

Min. delay when: RCn z - Tj nv 

Since logic done by steering signals with pass transistors does not require static power 
dissipation, a generalization of this result may be formulated. It pays to put as much logic 
into steering type circuits as possible until there are enough pass transistors to delay the 
signal by approximately one inverting logic delay. At this point, the level of the signal can 
be restored by an inverting logic stage. 

The pass transistor has another important advantage over an inverting logic stage. When 
used to control or steer a logic signal, the pass transistor has only an input, control, and 
output connections. A NAND or NOR logic gate implementing the same function, in 
addition to containing two more transistors and thus occupying more area, also requires 
VDD and GND connections. As a result, the topology of interconnection of pass transistor 
circuits is far simpler than that of inverting logic circuits. This topological simplicity of 
pass transistor control gates is an important factor in the system design concepts developed 
in later chapters. 
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Fig. 11a. Pass Transistor Chain Propagating a Carry Signal 
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Fig. lib. Equivalent Circuit 



Fig. lid. Minimizing Delay by Interposing Inverters 


[Cli.l: MOS Devices and Circuits: Sect.21 <Conway>mos2.vlsi January 27, 1978 l:4h PM 


Pullup/Pulldown Ratios for Inverting Logic Coupled by Pass transistors 

Early in this chapter we found that when an inverting logic stage directly drives another 
such stage, a pull up to pulldown ratio Z piJ /Zp d = (Lp U /Wp U )/(Lp d /W pd ) of 4:1 y ieIds 
equal inverter margins and also provides an output sufficiently less than for an input 
equal to VDD. Rather than coupling inverting logic stages directly, we often couple them 
with pass transistors for the reasons developed in the preceding section, thus affecting the 
required pullup to pulldown ratio. 

Figure 12a shows two inverters connected by a pass transistor. If the output of the first 
inverter nears VDD, the input of the second inverter can rise at most to (VDD - V th p), 
where V^p is the threshold of the pass transistor. Even with VDD on its gate, the pass 
transistor opens the connecting path once its input side rises above (VDD - V^p). For the 
second inverter to have its output driven as low with an input of (VDD - V thp ) as would a 
standard inverter with an input of VDD, it must have a larger pullup to pulldown ratio, 
calculated as follows. 

With inputs near VDD, the pullups of inverters are in saturation, and the pulldowns are in 
the resistive region. Figure 12b shows equivalent circuits for two inverters, one with VDD as 
input, the other with (VDD - V t j np ). For their output voltages to be equal under these 
conditions, I^Rj must equal 12*^2 5 anc * substituting from eq.3a and eq.5 we find: 

( Z pul /Z pdl>< Vl)D - V .h> - < Z pu2 ZZ pd2>< VDD - V th - V lhp> 

Since V^ of the pulldowns ~ 0.2VDD, and V^p of the pass transistor ~ 0.3VDD due to the 
body effect, then Z pu 2 /Zp d2 ~ 2Z pul /Z pdl . Thus a ratio (Lp U /Wp U )/(L pd /W pd ) = 8 is 
usually used for inverting logic stages placed as level restorers between sections of pass 
transistor logic. 
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Fig. 12a. Inverters Coupled by Pass Transistor 



Fig. 12b. For Vout2 = Voutl, Zpu2/Zpd2 = 8 
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Transit Times and Clock Periods 

In chapter 3 we will develop a system design methodology in which we will be able to 
construct and estimate the performance of arbitrarily complex digital systems, using only 
the basic circuit forms presented in the preceding sections. The basic system building block 
in the design methodology is a register to register transfer through combinational logic, 
implemented with pass transistors and inverting logic stages. Using the basic ideas already 
presented, we may anticipate the results of that chapter in order to estimate the maximum 
clocking frequency of such systems. 

The design methodology uses a tw'o-phase non-overlapping clock scheme, described in detail 
in chapter 3. During the first clock phase, data passes from one register, through 
combinational logic stages and pass transistors to a second register. During the second clock 
phase, data passes from the second register through still more logic and pass transistors to a 
third (or possibly back to the first) register. The data storage registers are implemented by 
using charge stored on the input gates of inverting logic stages, the charge being isolated by 
pass transistors controlled by clock signals, as described in full detail in chapter 3. 

Since pass transistors are used to connect inverting logic stages, inverter ratios of k ~ 8 are 
required. If the combinational logic between registers is implemented using only pass 
transistors, and if the delays through the pass transistors have been carefully matched to 
those of the inverting logic stages, the total delay will be twice that of the simple k = 8 
inverter. In the absence of strays, the k = 8 inverters have a maximum delay (in the case of 
the output rising towards VDD) of 8t, and hence a minimum of 16t must be allowed for 
the inverter plus logic delay. However, in most designs the stray capacitance is at least equal 
to that inherent in the circuit. Thus the minimum time required for one such operation is 
- 30t. Control lines to the combinational logic and pass transistors each typically drive the 
gates of 10 to 30 transistors. Even when using a super buffer driver, the delay introduced 
by this fan out is at least the minimum driving time for a capacitive load. With Y = 30, 
this time is ~ 9x. To this we must add an 8r inverter delay for operation of the drivers. 

Thus the total time for one clock phase is ~ 50r. Since two clock phases are required per 
cycle, a minimum clocking period of ~ 100r is required for system designed in this way. In 
1978, r ~0.3 nanoseconds, and clocking periods of 30 to 50 ns are achievable in carefully 
structured integrated s\stents where successive stages are in close physical proximity. If it is 
necessary to communicate data over long distances, longer periods are required. 
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Properties of Cross-Coupled Circuits 

In many control sequencing and data storage applications, memory cells and registers are 
built using two inverters driving each other, as shown in figure 13a. This circuit can be set 
in either the state where Vj is high and is low, or in the state where Vj is low and V 2 is 
high. In either case, the condition is stable and will not change to the other condition 
unless it is forced there through some external means. The detailed methods of setting such 
cross-coupled circuits into one state or another will be discussed in detail later. However, it 
is important at the present time to understand the time evolution of signals impressed upon 
cross-coupled circuits, since they exhibit properties different from circuits not having a 
feedback path from their output to an input. 

We have seen that there exists a voltage at which the output of an inverter is approximately 
equal 10 its input voltage. If a cross-coupled circuit is inadvertently placed in a situation 
where its input voltage is equal to this value, then an unstable equilibrium condition is 
created where voltages V, and V, are equal. Since the net current flowing onto either gate is 
now zero, there is no forcing function driving the system to any voltage other than this 
equilibrium one, and the circuit can stay in this condition for an indefinite period. 
However, if either voltage changes, even very slightly, the circuit will leave this unstable 
equilibrium. For example, if the voltage V» is increased from its unstable equilibrium value 
V- by a slight amount, this will in time cause a lowering of voltage V 2 , as net current 
flows from gate 1. This lowering of V 2 will at some later time cause Vj to increase further. 
As lime goes on, the circuit will feedback on itself until it rests in a stable equilibrium state. 

The possibility of such unstable equilibria in cross coupled circuits has important system 
implications^, as we will later see. For this reason, we will make a fairly detailed analysis 
of this circuit's behavior near the metastable state. While it is not essential that the reader 
follow' all the details of the analysis, the final result should be studied carefully. The time 
constant of the final result depends in detail on the regions of operation of the transistors 
near the metastable state, as given in the following analysis. However, the exponential form 
of the result follows simply from the fact that the forcing function pushing the voltage 
away from the metastable point is proportional to the voltage's distance away from that 
point. This general behavior is characteristic of bistable storage elements in any technology. 
However, more complex waveforms are observed in logic families having more than one 
lime constant per stage. 
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The lime evolution of this process can be traced as follows. At the unstable equilibrium, the 
current in the pullups equals that in the pulldowns, and is some constant, kj, times (V| ny - 
Y'lh) 2 . If V! is then changed by some small AVjto V )nit , I pu2 remains constant but I pd 2 
changes immediately, producing a non-zero l g ]_: 

Igl = 'pu2 - 'pd2 ■ klC( V ,nv-V t h> 2 - (V.nv+AVi-Vjh 2 )] 

For small AV|, I gl = “2k 1 (Vj nv -V th )AV 1 . More precisely, since I gi = function(V},V 2 ), 
then near V| ny : 

dl gl /dv { - -2k 1 (V inv -V th ) 

Noting that the pullups are not quite in saturation, but are in the resistive region, and: 

ai gl /av 2 = -i/R pu . 

where R pu = effective resistance of the pullup near Vj nv . Noting that I g ^ = C g dV g2 /dt, we 
find that: 

dl gl /dt = -2k 1 (Vi nv -V th) [dV 1 /dt] - (l/R pu )[dV 2 /dt] = C g [d 2 V 2 /dt 2 ] 

Evaluating the constants in this equation yields -k^Vj nv” v th) = C g /T «' where T o iS the 
saturation transit time of the pulldowns for t near zero. If we assume a pullup/pulldown Z 
ratio of 4:1, and consider the operating conditions near t = 0, then evaluating the effective 
resistance of the pullups in terms of the parameters of the pulldowns yields 1/R pu ~ C g /r 0 . 
Therefore: 

-(2/T 0 )[dV 1 /dt] -(l/r 0 )[dV 2 /dt] = d 2 V 2 /dt 2 
Similarly: ~(2/T 0 )[dV 2 /dt] _ (l/r 0 )[dVj/dt] = d 2 Vj/dt 2 

Near t = 0, dV^/dt approximately equals - dV 2 /dt, and therefore: 

d 2 V 1 /dt 2 = -(l/r 0 ) dV 2 /dt = (1/t 0 ) 2 Vj + const. [eq.!6a] 

The solution to eq. 16a is an exponential diverging from the equilibrium voltage V jnv , with 
a time constant r /2 equal to 1/2 the pulldown delay time. Note that the solution given in 
eq.l6b satisfies the conditions that V(0) = Vj n j p and V(l) = constant if Vj n j t = V niy : 


Vj(t) - Vj nv + (^init ^ inv^ e 


[eq.l6b] 



[Ch.l: MOS Devices and Circuits: Sect. 2] <Conway>mos2.vlsi January 27, 1978 1:46 PM 


The above analysis applies to cross coupled circuits in the absence of noise. Noise 
unavoidably present in the circuit spreads the input voltage into a band from which such an 
unstable equilibrium can statistically be initiated. The width of this band is equal to the 
noise amplitude. Any timing condition which causes the input voltage to settle in this band 
has some probability of causing a balanced condition, from which the circuit may require an 
arbitrary long time to recover. The time evolution of such a system is shown in Fig. 13b, 
for several initial voltages near V jnv . The time for the cross-coupled system to reach one of 
its equilibria is thus logarithmic in the displacement from V inv , and is given approximately 
by eq. 16c: 


t ~ T 0 l n [Vj nv /(Vj n j t -Vj nv )] 


[eq.l6c] 


Effects of Scaling Down the Dimensions of MOS Circuits and Systems 

This section examines the effects on important system parameters resulting from scaling 
down all dimensions of an integrated system, including those vertical to the surface, by 
dividing them by a constant factor a. The voltage is likewise scaled down by dividing by 
the same constant factor a. Using this convention, all electric fields in the circuit will 
remain constant and hence many non-linear factors affecting performance will not change 
as they would if a more complex scaling were used. 

Figure 14a. shows a MOSFET of dimensions L, W, D, with a (Vgs - Vth) = V. Figure 14b. 
shows a MOSFF.T similar to that in figure 14a., but of dimensions L' = L/a. W’ = W/a, 
D' = D/a, and V’ = W/a. Refer to equations 1., 2., and 3. From these equations we will 
find that as the scale down factor a is increased, the transit time, the gate capacitance, and 
drain to source current of every individual transistor in the system scale down 
proportionally, as follows: 

r oc L 2 /V, t’/t = [(L/a) 2 /(V/a)]/[L 2 /V], t = r/a 

C oc LW/D, C/C = (L/a)(W/a)/(D/a)]/[LW/D], C' = C/a 

I cc WV 2 /I.D, 171 = [(WV 2 /«-V(l lVa 2 )]/[WV 2 /LD], I* = 1/a 
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Switching power, P sw , is the energy stored on the capacitance of a given device divided by 
the clock period, or lime between successive charging and discharging of the capacitance. A 
system's clock period is proportional to the r of its smallest devices. As devices are made 
smaller and faster, the clock period is proportionally shortened. Also, the dc power, P^, 
dissipated by any static circuit equals I times V. Therefore, 

p sw <X CV 2 /T CC WvCdL, P sw ' = p sw /a 2 

P dc = IV , .-. P dc ’ = P dc /« 2 

'y 

Both the switching power and static power per device scale down as 1/a . The average dc 
power for most systems can be approximated by adding the total P $w to one-half the total 
dc power which would result if all level restoring logic pulldowns were turned on. The 
contribution of pass transistor logic to the average dc power drawn by the system is due to 
the switching power consumed by the driving circuits which charge and discharge the pass 
transistor control gates. 

The switching energy per device, E sw , is an important metric of device performance. It is 
equal to the power consumed by the device at maximum clock frequency multiplied by the 
device delay, and scales down as follows: 

F a CV 2 ' F ' = F /or* 

r sw ’ • • c sw sw ' u 

The following table summarizes values of the important system parameters for current 
technology, and for a future technology near the limits imposed by physical law: 


1978 19XX 

Minimum Feature Size: 6 jam 0.3 (lffl 

t : ~0.3 ns ~0.02 ns 

Esw: ~10’ 12 J ~2 x 10" 16 J 

S\stem Clock Period: ~30 to 50 ns ~2 to 4 ns 

[see earlier section] 
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A more detailed plot of the channel conductance of an MOS transistor near the threshold 
voltage is shown in figure 15. Below the nominal threshold, the conductance (1/R) is not in 
reality zero, but depends on gate voltage and temperature as follows: 

l/R cc e( V gs- V .hV(kT/q), 

where T is the absolute temperature, q is the charge on the electron, and k is Boltzmann’s 
constant. At room temperature, kT/q ~ 0.025 volts. At present threshold voltages, as in 
the right curve in figure 15., an off device is below threshold by perhaps 20 kT/q, i.e. by 
about 0.5 volts, and its conductance is decreased by a factor of the order of ten million. 
Said another way, if the device is used as a pass transistor, a quantity of charge which takes 
a time t to pass through the on device, will take a time on the order of 10 't to "leak" 
through the off device. The use of pass transistors switches to isolate and "dynamically 
store" charge on circuit nodes is common in many memory applications using 1978 transistor 
dimensions. However, if the threshold voltage is scaled down by a factor of perhaps 5, as 
shown in the left plot in figure 15., then an off transistor is only 4kT/q below threshold. 
Therefore its conductance when "off" is only a factor of 100 or so less than when it is "on". 
Charge stored dynamicaly on a circuit node by the transistor when "on", will safely remain 
on that node for only a few typical system clock periods. The charge will not remain on the 
node for a very large number of periods as in present memory devices using this technique. 

Now, suppose we scale down an entire integrated system by a scale down factor of a = 10. 
The resulting system will have one hundred times the number of circuits per unit area. The 
total power per unit area remains constant. All voltages are reduced by the factor of 10. 
The current per unit area is increased by a factor of 10. The time delay per stage is 
decreased by a factor of 10. The power-delay product decreases by a factor of 1000. 

This is a rather attractive scaling in all ways except for the current density. The delivery of 
the required average dc current presents an important obstacle to scaling. This current must 
be carried to the various circuits in the system on metal conductors, in order that the voltage 
drop from the off-chip source to the on-chip subsystems will not be excessive. Metal paths 
have an upper current density limit imposed by a phenomenon called metal migration, 
discussed further in chapter 2. Many metal paths in today's integrated circuits are already 
operated near their current density limit. As the above type of scaling is applied to a 
system, the conductors get narrower, but still deliver the same current on the average to the 
circuits supplied by them. 
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Therefore, it will be necessary to find ways of decreasing system current requirements to 
approximately a constant current per unit area relative to the present current densities. In 
n-channel silicon gate technology, this objective can be partially achieved by using pass 
transistor logic in as many places as possible and avoiding restoring logic except where it is 
absolutely necessary. Numerous examples of this sort of design are given later in this text. 
This design approach also has the advantages of tending to minimize delay per unit function 
and to maximize logic functions per unit area. However, when scaled down to submicron 
size, the pass transistors will suffer from the subthreshold current problem. It is possible 
that when the fabrication technologies have been developed to enable scaling down to sub¬ 
micron devices, a technology such as complementary MOS, which does not draw any dc 
current, may be preferable to the nMOS technology used to illustrate this monograph. 
However, even if this occurs, the methodology developed in the text can still be applied in 
the design of integrated systems in that technology. 

The limit to the kind of scaling described above occurs when the devices created are no 
longer able to perform the switching function. To perform the switching function, the ratio 
of transistor on to off conductance must be >>1, and therefore the voltage operating the 
circuit must be many times kT/q. For this reason, even circuits optimized for operation at 
the lowest possible supply voltages still require a VDD of ~ 0.5 volts. Devices in 1978 
operate with a VDD of approximately five volts and minimum channel lengths of 
approximately six microns. Therefore, the kind of scaling we have envisioned here will take 
us to devices wdth approximately one half micron channel lengths and current densities 
approximately ten times what they are today. Power per unit area will remain constant over 
that range. Smaller devices might be built but must be used without lowering the voltage 
any further. Consequently the power per unit area will increase. Finally, there appears to 
be a fundamental limit- 5 of approximately one quarter micron channel length, where certain 
physical effects such as the tunneling through the gate oxide, and fluctuations in the 
positions of impurities in the depletion layers, begin to make the devices of smaller 
dimension unworkable. 
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The series of steps by which a geometric pattern or set of geometric patterns is transformed 
into an operating integrated system is called a wafer fabrication process , or simply a 
process . An integrated system in MOS technology consists of a number of superimposed 
layers of conducting, insulating, and transistor forming materials. By arranging 
predetermined geometric shapes in each of these layers, a system of the required function 
may be constructed. The task of the integrated system designer is to devise the geometric 
shapes and their locations in each of the various layers of the system. The task of the 
process itself is to create the layers and transfer into each of them the geometric shapes 
determined by the system design. 

Modern wafer fabrication is probably the most exacting production process ever developed. 
Since the 1950's, enormous human resources have been expended by the industry to perfect 
the myriad of details involved. The impurities in materials and chemical reagents are 
measured in parts per billion. Dimensions are controlled to a few parts per million. Each 
step has been carefully devised to produce some circuit feature with the minimum possible 
deviation from the ideal behavior. The results have been little short of spectacular: chips 
with many tens of thousands of transistors be produced for under ten dollars. In addition, 
wafer fabrication has reached a level of maturity where the system designer need not be 
concerned with the fine details of its execution. The following sections present a broad 
overview' sufficient to convey the ideas involved, and in particular those relevant for system 
design. Our formulation of the basic concepts anticipates the evolution of the technology 
towards ever finer dimensions. 

In this chapter we describe the patterning sequence and how it is applied in a simple, 
specific integrated system process: nMOS. A number of other topics are covered which are 
related to the processing technology, or are closely tied to the properties of the underlying 
materials. 
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Patterning 

The overall fabrication process consists of the successive patterning of a particular sequence 
of layers. The patterning steps by which geometrical shapes are transferred into a layer of 
the final system, is very similar for each of the layers. The overall process is more easily 
visualized if we first describe the details of patterning one layer. We can then describe the 
particular sequence of layers used in the process to build up an integrated system, without 
repeating the details of patterning for each of the layers. 

A common step in many processes is the creation of a silicon dioxide insulating layer on the 
surface of a silicon wafer, and the selective removal of sections of the insulating layer 
exposing the underlying silicon. We will use this step for our patterning example. The step 
begins with a bare polished silicon wafer, shown in cross section in figure 1. The wafer is 
exposed to oxygen in a high temperature furnace to grow a uniform layer of silicon dioxide 
on its surface, as shown in figure 2. After the wafer is cooled, it is coated with a thin film 
of organic resist material as shown in figure 3. The resist is thoroughly dried and baked to 
insure its integrity. The wafer is now ready to begin the patterning. 

At the time of wafer fabrication the pattern to be transferred to the wafer surface exists as 
a mask. A mask is merely a transparent support material coated with a thin layer of opaque 
material. Certain portions of the opaque material are removed, leaving opaque material on 
the mask in the precise pattern required on the silicon surface. Such a mask with the 
desired pattern engraved upon it is brought face down into close proximity with the wafer 
surface as shown in figure 4. The dark areas of opaque material on the surface of the mask 
are located where it is desired to leave silicon dioxide on the surface of the silicon. 
Openings in the mask correspond to areas where it is desired to remove silicon dioxide from 
the silicon surface. When the mask has been brought firmly into proximity with the wafer 
itself, its back surface is flooded with an intense source of ionizing radiation such as 
ultraviolet light or low energy x-rays. The radiation is stopped in areas where the mask has 
opaque material on its surface. Where there is no opaque material on the mask surface, the 
ionizing radiation radiation passes on through and into the resist, the silicon dioxide, and 
silicon. While the ionizing radiation has little effect on the silicon dioxide and the silicon, 

■ft 

it breaks down the molecular structure of the resist into considerably smaller molecules. 

* We have chosen to illustrate this text using positive resist, i.e. the resist material remaining after exposure and 
development corresopponds to the opaque mask areas. Negative resists are also in common use. Positive resists 
arc t\piculI) workable to finer feature sizes, and are likely to become dominant as the technology progresses. 
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Figure 5. Exposed Resist 
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After exposure to the ionizing radiation, the wafer has the characteristics shown in figure 5. 
In areas exposed to the radiation, the resist molecules have been broken down to much 
lighter molecular weight than that of unexposed resist molecules. The solubility of organic 
molecules in various organic solvents is a very steep function of the molecular weight of the 
molecules. Hence, it is possible to dissolve exposed resist material in solvents which will not 
dissolve the unexposed resist material. In this way the resist can be "developed” as shown in 
figure 6 by merely immersing the silicon wafer in a suitable solvent. 

Thus far, the pattern originally existing as a set of opaque geometries on the mask surface 
has been transferred into a corresponding pattern in the organic resist material on the 
surface of the silicon dioxide. This same pattern can now be transferred to the silicon 
dioxide itself by exposing the wafer to a material which will etch silicon dioxide but not 
attack either the organic resist material or the silicon wafer surface. This etching step is 
usually done with hydrofluoric acid, which easily dissolves silicon dioxide. However, organic 
materials are very resistant to hydrofluoric acid, and it is incapable of etching the surface of 
silicon. The result of this etching step is shown in figure 7. 

The final step in patterning is removal of the remaining organic resist material. Three 
techniques have been used to remove resist materials. Strong organic solvents will dissolve 
even unexposed resist material. Strong acids such as chromic acid actively attack organics. 
The wafer can be exposed to atomic oxygen which will oxidize away any organic materials 
present on its surface. Once the resist material is removed, the finished pattern on the 
wafer surface is as shown in figure 8. Notice that we have transferred the geometric pattern 
which originally existed on the surface of the mask directly into the silicon dioxide on the 
wafer surface. While a foreign material was present on the wafer surface during the 
patterning process, it has now disappeared and the only materials present are those which 
will be part of the finished wafer. 

A similar sequence of steps is used to selectively pattern each of the layers of the integrated 
system. These differ only in the details of the etchants used, etc. Thus as we study the 
processing of the various layers, the reader need not visualize all the details of the 
patterning sequence for each layer, but only recognize that a mask pattern for a layer can be 
transferred into a pattern in the material of that layer. 
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4 


Scaling of Patterning Technology 

As discussed in chapter 1, semiconductor devices could be at least an order of magnitude 
smaller in linear dimension than those typically manufactured in 1977 and still function 
correctly. The fundamental dimensional limitation is approximately a one quarter micron 
channel length, corresponding to a length unit X (to be discussed under design rules) of 
approximately 0.1 micron. This limitation appears to apply to both bipolar and MOS 
technologies. It has been possible for several years to create sub-micron lines using electron 
beam and x-ray techniques, and there is considerable research and development under way 
to bring these patterning technologies into general manufacturing use. It appears that there 
are no fundamental barriers preventing creation of patterns for ultimately small devices. A 
more detailed discussion of the techniques involved is given in chapter 4. 


The Silicon Gate n-Channel MOS Process 

We now describe the particular sequence of patterned layers used to build up nMOS 
integrated circuits and systems. Figures 9 through 14 illustrate a simple but complete 
sequence of patterning and processing steps which are sufficient to fabricate a complete, 
integrated system. The example follows the fabrication of one simple circuit within a 
system, but all other circuits are simultaneously implemented by the same process. The 
example used is the basic inverter circuit. The top illustration in figures 9 through 14 shows 
the top view of the layers of the circuit layout. The lower illustration in each of those 
figures shows the cross section through the cut indicated by the downward arrows. The 
vertical scale in these cross sections has been greatly exaggerated for illustrative purposes. 

The opening in the opaque material of the first mask is shown by the green outline in the 
top portion of figure 9. This opening exposes all areas that will be eventually be the 
diffusion level. It includes the sources and drains of all transistors in the circuit, together 
with the transistor gate areas, and any diffusion level circuit interconnection paths. This 
mask is used for the first step in the process, the patterning of silicon dioxide on silicon as 
described in the previous section. The resulting cross section is shown in the lower portion 
of figure 9. 
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The second step in the process is to differentiate transistors which are normally "on" 
(depletion mode) from those which are normally "off" (enhancement mode). This is done 
by overcoating the wafer with resist material, exposing the resist material through openings 
in a second mask, and developing it in the manner shown in figure 10. This patterning step 
leaves an opening in the resist material over the area to be selectively turned into depletion 
mode transistors. The actual conversion of the underlying silicon is then done by 
implanting ions of arsenic or antimony into the silicon surface. The resist material, where 
present, acts to prevent the ions from reaching the silicon surface. Therefore, ions are only 
implanted in the silicon area free of resist. The implanted layer, which causes a slight n- 
type conductivity in the underlying silicon, is shown by the yellow box in figure 10. Once 
the depletion areas are defined, the resist material is removed from the surface of the wafer. 

The w'afer is then heated while exposed to oxygen, to grow a very thin layer of silicon 
dioxide over its entire surface. It is then entirely coated with a thin layer of polycrystalline 

silicon, usually called polysilicon or poly for short. Note that this polysilicon layer is 

everywhere insulated from the underlying materials by the layer of thin oxide, and 
additionally by thicker oxide in some areas. The polysilicon will form the gates of all the 
transistors in the circuit and will also serve as a second layer for circuit interconnections. A 
third mask is used to pattern the polysilicon by steps similar to those previously described, 
with the result shown in red in figure 11. The left-most polysilicon area will function as 
the gate of the pull down transistor of the inverter we are constructing, while the square to 
the right will function as the gate of the depletion mode pull up transistor. 

Once the polysilicon areas have been defined, n-type regions can be diffused into the p- 
type silicon substrate, forming the sources and drains of the transistors and the first level of 
interconnections. This step is done by first removing the thin gate oxide in all areas not 
covered by the (red) polysilicon. The wafer is then exposed to n-type impurities such as 

arsenic, antimony or phosphorus at high temperature for a sufficient period of time to 

allow these impurities to convert the exposed underlying silicon to n-type material. The 
areas of resulting n-type material are showm in green. Notice, in the cross section of figure 
12., that the red polysilicon area and the thin oxide under it act to prevent impurities from 
diffusing into the underlying silicon. Therefore the impurities reach the silicon substrate 
only in areas not covered by the polysilicon and not overlain by the thick original oxide. In 
this way the active transistor area is formed in ail places where the patterned polysilicon 
overlies the thin oxide area defined in the previous step. The diffusion level sources and 
drains of the transistors are automatically filled in between the polysilicon areas and extend 
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up to the edges of the thick oxide areas. The major advantage of the silicon gate process is 
that it does not require a critical alignment between a mask which defines the green source 
and drain areas and a separate mask which defines the gate areas. Rather, the transistors are 
formed by the intersection of the two masks, and the conducting n-type diffused regions are 
formed in all areas where the green mask is not covered by the red mask. 

All the transistors of the basic inverter circuit are now defined. Connections must now be 
made to the input gate, between the gate and source of the pullup, and to VDD and GND. 
These interconnections will be made with a metal layer that can make contact to both the 
diffused areas and the polycrystalline areas. However, in order to ensure that the metal does 
not make contact to underlying areas except where intended, another layer of insulating 
oxide is coated over the entire circuit. At the places where the overlying metal is to make 
contact to either the polysilicon or the diffused areas, the overlying oxide is selectively 
removed by the patterning process as previously described. The result of coating the wafer 
with the overlying oxide and removing this oxide in places where contacts are desired, is 
shown in figure 13. In the top view, the black areas are those defined by openings in the 
contact mask, the fourth in the process's sequence of mask patterns. In cross section notice 
that in the contact areas all oxide has been removed down to either the polycrystalline 
silicon or the diffused area. 

Once the overlying oxide has been patterned in this way, the entire wafer is coated with 
metal, usually aluminum, and the metal is patterned with a fifth mask to form the 
conducting areas required by the circuit. The top view in figure 14 shows three metal lines 
running vertically, the left most connecting to the input gate of the inverter, the center one 
being ground, and the right-most one forming the VDD connection to the inverter. The 
peculiar structure* formed by the metal square slightly to the right of center connects the 
polysilicon gate of the depletion mode pull up transistor to its source and to the drain of 
the pull down transistor. Rather than making two separate contacts from the metal line to 
the pullup's polysilicon gate region and to the adjacent diffusion region, area can be 
conserved by coalescing the contacts into the compact arrangement shown. This geometrical 
arrangement is known as a butting contact and will be used extensively throughout the text. 

*ln general, it is good practice to avoid placing contacts over active transistor area whenever possible. However, 
blitting contacts in the location shown here reduce the area and simplify the geometry of the basic inverter and 
many other circuits, and have been so placed by the authors in many systems run on a number of commercial 
fabrication processes without ill effects. A more conservative approach would be to place the blitting contact 
adjacent to, rather than over, the active pullup area. 
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The inherent properties of the silicon gate process allow the blue metal layer to cross over 
either the red polysilicon layer or the green diffused areas, without making contact unless 
one is specifically provided. The red polysilicon areas, however, cannot cross the green 
diffused areas without forming a transistor. The transistors formed by the intersection of 
these two masks can be either enhancement mode if no yellow implantation is provided, or 
depletion mode if such an implantation is provided. Hence, the enhancement mode 
transistors are defined by the intersection of the green and red masks while the depletion 
mode transistors are defined by the intersection of the green, red and yellow masks. 

If we wish to fabricate only a small number of prototype system chips and to have access to 
the metal level for the probing of test points, the wafer fabrication sequence can be 
terminated at this step. However, when fabricating large numbers of chips of a debugged 
design, the wafer surface is usually coated with another layer of oxide. This step, called 
overglassing, provides physical protection for the devices in the system. A sixth mask is 
then used to pattern contact cuts in the overglassing at the locations of relatively large metal 
contact pads. 

Each wafer contains many individual chips. The chips are separated by scribing the wafer 
surface with a diamond scribe, and then fracturing the wafer along the scribe lines. Each 
individual chip is then cemented in place in a package, and fine metal wire leads are bonded 
to the metal contact pads on the chip and to pads in the package which connect with its 
external pins. A cover is then cemented over the recess in the package which contains the 
silicon chip, and the completed system is ready for testing and use. 


Yield Statistics 

Of the large number of individual integrated system chips fabricated on a single silicon 
wafer, only a fraction will be completely functional. Flaws in the masks, dust particles on 
the wafer surface, defects in the underlying silicon, etc., all cause certain devices to be less 
than perfect. With present design techniques, any single flaw of sufficient size will kill an 
entire chip. 

The simplest model for the yield, or the fraction of the chips fabricated which do not 
contain fatal flaws, assumes (naively) that the flaws are randomly distributed over the wafer, 
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and that one or more flaws anywhere on a chip will cause it to be non-operative. If there 
are N fatal flaws per unit area, and the area of an individual chip is A, the probability that 
a chip has n flaws is in the simplest case just given by the Poisson distribution, P n (NA). 
The probability of a good chip is: 

P 0 (NA) = e' NA [eq.l] 

While this equation does not accurately represent the detailed behavior of real fabrication 
processes, it is a good approximate model for estimating the yield of alternative designs. 
The exponential is such a steep function that a very simple rule is possible: chips with areas 
many times 1/N will simply never be found without flaws. Areas must be kept less than a 
few times 1/N if one flaw will kill a system. Design forms may be developed in the future 
which will permit systems to work even in the presence of flaws. If such forms are 
developed, the entire notion of yield will be completely changed and much larger chips will 
be possible. 

Once a wafer has been fabricated, each chip must be tested to determine if it is functional. 
Testing of simple combinatorial logic networks is straightforward and may be done 
completely. Complete testing of complex systems with internal sequencing is not in general 
possible, and most integrated system chips manufactured, even at 1978 levels of complexity, 
are not economically testable even for a small fraction of their possible internal states. 

As time passes and the number of devices per chip continues to increase, it becomes 
important to consider including special functions in the design of integrated systems to 
improve their testability. The basic problem is to linearize an otherwise combinatorial 
problem. One approach to this is: 

(i) Define the entire system as a set of register to register transfer blocks, i.e. successive 
stages of storage registers with combinational logic between them. 

(ii) Provide for reading and writing from the external world to/from each of the storage 
registers. 


The storage locations are first tested independently for their ability to store data or control 
information. If all storage locations pass this test, each combinational logic block can be 
tested separately, by use of its input and output storage locations. Such a test becomes 
essentially linear in the number of components, and may be accomplished in an acceptable 
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time period, even for extremely complex systems. However, without access to the individual 
storage locations, testing rapidly becomes hopeless. For this reason even present day 
microprocessors are very incompletely tested. When one is used for a while, an apparently 
new and sudden malfunction may simply be the first occurrence of a particular state of 
control and data in the system, and thus may represent the first time the device had been 
"tested" under those conditions. 

From experience gained in testing memory parts, it is known that the behavior of one 
circuit can be influenced by the state of a nearby circuit. For example, a memory cell may 
be able to remember both a logic-1 and a logic-0 if its neighbor is at a logic-0, but may be 
able to retain only a logic-0 if its neighbor is at a logic-1. Failures of this type are 
dependent upon the data patterns present in the system, and are known as paiiern sensitive 
failures. In a reasonable (or even an unreasonable) time, it is not possible to exercise even a 
minute fraction of all the combinations of bit patterns of many integrated systems. What is 
done instead is to apply our knowledge of the physics of such failures, and construct a 
model for possible failure modes. In the memory example, we may conclude that any flaw 
not visible optically will be unable to reach beyond the immediate locality of the cell 
involved. Hence, pattern sensitivity in the behavior of a particular cell may be introduced 
by other cells in the same row or column of an array of memory cells, or by diagonal 
nearest neighbors. A test for pattern sensitivity under this model is quite fast, being only 
slightly worse than a linear function of the number of devices on the chip. 

In order to test for pattern sensitive failures, we must construct a physical model for the 
possible failure mechanisms. This model will inevitably include the physical proximity of 
other signals. For this reason, any practical test for pattern sensitive failures must be based 
on a knowledge of the physical location of the various elements of the subsystem being 
tested. The task of preparing such tests is thus greatly eased by regularity in the design and 
physical layout of a system. 



[Ch.2; Integrated System Fabrication: Sect.l] <Conway>fabl.vlsi January 27, 1978 11:31 AM 


Scaling of the Processing Technology 

In order to have a complete process for sub-micron transistors, it is necessary not only to 
make patterns in the resist material but to transfer these patterns to the underlying layers in 
the silicon and silicon dioxide. Traditionally, wet etching processes have been used. 
However, wet etching processes do not scale well into the sub-micron range. Alternatives 
are currently being developed which appear workable. Etching with plasmas (i.e, glow 
discharges of gaseous materials resulting in free ions of great chemical activity) is already 
used in a number of advanced processing laboratories. It is known that very well controlled 
etching can be achieved in this way and it seems likely that essentially no wet processing 
will be used in the construction of sub-micron devices. Ion implantation, an ideal method 
for achieving controlled doses of impurity ions in the silicon surface, is already a common 
production technique in essentially all MOS processing facilities. Metal layers for sub¬ 
micron processes must be thicker in relationship to their width than today’s commercial 
processing technology allows. A possible solution to this problem may be the use of a 
process known as ion milling for metal patterning. In this process ions of modest energy 
sputter away any metal not covered with resist material, yielding much steeper sides on the 
metal thus patterned than do current wet etching processes. 

It appears that the basic technological pieces exist to enable development of a complete 
patterning and wafer fabrication process at sub-micron dimensions. In reality, the ultimate 
submicron process will not emerge full-blown, but dimensions will gradually be reduced, as 
one after another of the myriad of technological difficulties are surmounted. The sketch we 
have given is rather an artist’s conception of the possibility of such an ultimate process. We 
do believe, however, that the evolution of this process is of fundamental importance to the 
entire electronics industry. 
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Design Rules 

Perhaps the most powerful attribute of modern wafer fabrication processes is that they are 
pattern independent. That is, there is a clean separation between the processing done during 
wafer fabrication and the design effort which creates the patterns to be implemented. This 
separation requires a precise definition to the designer of the capabilities of the processing 
line. This specification usually takes the form of a set of permissible geometries which may 
be used by the designer with the knowledge that they are within the resolution of the 
process itself and that they do not violate the device physics required for proper operation 
of transistors and interconnections formed by the process. When reduced to their simplest 
form, such geometrical constraints are called design rules. The constraints are of the form 
of minimum allowable values for certain widths , separations, extensions, and overlaps of 
geometrical objects patterned in the various levels of a system. 

As processes have improved over the years, the absolute values of the permissible sizes and 
spacings of various layers have become progressively smaller. There is no evidence that this 
trend is abating. In fact, there is every reason to believe that at least another order of 
magnitude of shrinkage in linear dimensions is possible. For this reason we present a set of 
design rules in dimensionless form, as constraints on the allowable ratios of certain 
distances to a basic length unit. The basic unit of length measurement used is equal to the 
fundamental resolution of the process itself. This is the distance by which a geometrical 
feature on any one layer may stray from another geometrical feature on the same or on 
another layer, all processing factors considered and an appropriate safety factor added. It is 
set by phenomena such as overetching, misalignment between mask levels, distortion of the 
silicon wafer ("runout") due to high temperature processing, over or underexposure of resist, 
etc. All dimensions are given in terms of this elementary distance unit, which we refer to as 
the length-unit , A. In 1978 the length-unit A ~ 3 microns. One micron (/xm) = 10’ 6 meters. 

The rules given below have been abstracted from a number of processes over a range of 
values of A, corresponding to different points in time at different fabrication areas. They 
represent somewhat of a "least common denominator " likely to be representative of nMOS 
design rules for a reasonable period of time, as the value of A decreases in the future. 
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A typical minimum for the line.width of the diffused regions is 2A, as shown in figure 

15. The spacing required between two electrically separate diffused regions is a parameter 
which depends not merely upon the geometric resolution of the process, but also upon the 
physics of the devices formed. If two diffused regions pass too close together, the depletion 
layers associated with the junctions formed by these regions may overlap and result in a 
current flowing between the two regions when none was intended. In typical processes a 
safe rule of thumb is to allow 3A of separation, between any two diffused regions 
which are unconnected, as shown in figure 16. The width of a depletion layer associated 
with any diffused region depends upon the voltage on the region. If one of the regions is at 
ground potential, its depletion layer will of necessity be quite thin. In addition some 
processes provide a heavier doping level at the surface of the wafer between the diffused 
areas in order to alleviate the problem of overlap of depletion layers. In cases where either 
very low voltage exists on both diffused regions or where a heavily doped region has been 
implanted in the surface between the diffused areas, it is often possible to space diffused 
areas 2A apart. However, this should not be done without carefully checking the actual 
process by which the design is to be fabricated. 

The minimum for the width W p of polysilicon lines is similarly 2A. No depletion layers 
are associated with polysilicon lines, and therefore the separation, S pp , of two such lines 
may be as litle as 2A. These rules are illustrated in figures 17 and 18. 

We have so far considered the diffused and polysilicon layers separately. Another type of 
design rule concerns how the two layers interact with each other. Figure 19 shows a 
situation where a diffused line is running parallel to an independent polysilicon line, to 
which it is not anywhere connected. The only consideration here is that the two 
unconnected lines not overlap. If they did they would form an unwanted capacitor. 
Avoidance of this overlap requires a separation S pc j of only A between the two regions as 
shown in figure 19. A slightly more complex situation is shown in figure 20, where a 
polysilicon gate area intentionally crosses a diffused area, thereby forming a transistor. In 
order to make absolutely sure that the diffused region does not reach around the end of the 
gate and short out the drain to source path of the transistor with a thin diffused area, it is 
necessary for the polysilicon gate to extend a distance E pc j of at least 2A beyond the 
nominal boundary of the diffused area, as shown in figure 20. 

A composite of several of these design rules is shown in figure 21. Note that the minimum 
width for a diffused region applies to diffused regions formed between a normal boundary 
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of the diffused region and an edge of a transistor as well as to a diffused line formed by 
two normal boundaries. This situation is illustrated in the lower left corner of the figure. 

As we have seen in figure 10, ion implantation in the region which becomes the gate of a 
transistor will convert the resulting transistor into the depletion mode type. It is important 
that the implanted region extend outward beyond all four boundaries of the gate region, as 
shown in figure 22. To avoid any possibility that some small fraction of the transistor 
might remain enhancement mode, the yellow ion implantation region should extend a 
distance E ig of at least IVz A beyond each edge of the gate region. The separation S ig 
between an ion implantation region and an adjacent enhancement mode transistor gate 
region should also be at least IV 2 X. Both situations and their design rules are illustrated in 
the figure. 

A contact may be formed between the metal layer and either the diffused level or the 
polysilicon level by means of the contact mask. A set of rules apply to the amount by 
which each layer must provide an area surrounding any contact to it, so that the contact 
opening not find its way around the layer to something unintended below it. Since no 
physical factors apply here other than the relative registration of two levels, a very simple 
set of design rules results. Each level involved in a given contact must extend beyond the 
outer boundary of the contact cut by X at all points, as illustrated by extension distances 
E dc , E pc , and E mc , in figures 23, 24, and 26. The contacts themselves, like the minimum 
width lines in the other levels, must be at least 2A long and 2A wide (W c ). This situation is 
illustrated for the diffusion and polysilicon levels in figures 23 and 24. When making 
contact between a large metal region and a large diffused region, many small contacts spaced 
2 A apart should be used, as shown in figure 25. Contact cuts to diffusion should be at least 
2A from the nearest gate region, as shown in figure 25. 

A special case arises when it is desired to connect a polysilicon layer directly to a diffused 
layer, using the butting contact. The detailed geometric layout of the butting contact is 
shown in figure 26. In its minimum sized configuration, it is composed of a square region 
of diffision 4 A on a side, overlapped by a 3 A by 4 A rectangle of polysilicon. A 
rectangular contact cut, 2 A by 4 A in size, is made in the center of this structure. The 
structure is then overlayed with metal, thus connecting the polysilicon to the diffusion. The 
rules involved in figure 26 are identical to those given so far, with the addition of a 
minimum of one A overlap, O pc j, of the diffused and polysilicon layers in the center area of 
the contact. 
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In considering the design rules for the metal layer, notice that this layer in general runs over 
much more rugged terrain than any other level, as can be seen by referring to the cross 
section of figure 14. For this reason it is generally accepted practice to allow somewhat 
wider minimum lines and spaces for the metal layer than for the other layers. As a good 
working rule 3A widths (W m ), and 3A separations (S mm ) between independent metal lines 
should be provided, as shown in figure 27. The metal layer must surround the contact layer 
in much the same way that the diffused and polysilicon layers did. Since the resist material 
used for patterning the metal generally accumulates in the low areas of the wafer, it tends to 
be thicker in the neighborhood of contact than elsewhere. For this reason metal tends to be 
slightly larger after patterning in the vicinity of a contact than elsewhere. It is generally 
sufficient to allow only one A of space around the contact region for the metal, as for the 
other two layers. The rule for metal surrounding contacts is shown in figure 28. 

The above design rules are likely to remain valid as the length-unit A scales down in size 
with the passage of time. Occasionally, for specific commercial fabrication processes, some 
one or more of these rules may be relaxed or replaced by more complex rules, enabling 
slight reductions in the area of a system. While these details may be important for certain 
competitive products such as memory systems, they have the disadvantgage of making the 
system design a captive of the process specific design rules. Extensive redesign and checking 
is required to scale down such a design as the length-unit scales down. For this reason, we 
recommend use of the dimensionless rules given, especially for prototype systems. Designs 
implemented according to these rules are easily scaled, and may have reasonable longevity. 


Formal Description of Design Rules 


{ in preparation } 
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Electrical Parameters 


By satisfying the constraints imposed by the design rules, designers may create circuit layout 
patterns with the knowledge that the appropriate transistors, lines, etc., produced by the 
wafer fabrication process will be as originally specified in their layout patterns. To 
complete a design it is necessary to also know the electrical parameters of the transistors, 
diffused layers, polysilicon layers, etc., so that the performance of circuits can be evaluated. 
The resistances per square of the various layers and the capacitance per square micron with 
respect to underlying substrate are shown in Table 1. Note that the resistance of a square of 
material contacted along two opposite sides is independant of the size of the square, and 
equals the resistivity of the material divided by its thickness. The tabulated values are 
typical of processes running in 1978. As the circuit dimensions are scaled down by dividing 
by a factor a, the parameters scale approximately as shown in the table. 


Resistances: 

Metal 

~ 0.1 ohms/D 

Resistances/square scale up by a., 


Diffusion 

~ 10 ohms/D 

as dimensions scale down by a, 


Poly 

- 15-100 ohms/D 

except that the transistor 


Transistor 

- 10 4 ohms/D 

R/D is independent of a 

Capacitances: 

Gate-channel 

~ 4xl0~ 4 pf//Ltm 2 

2 

Capacitances/micron scale 


Diffusion 

~ 0.8xl0” 4 pf/fim 2 

up by a., as dimensions 


Poly 

- 0.4xl0“ 4 pf//xm 2 

scale down by a. 


Metal 

~ 0.3xl0~ 4 pf//xm 2 



Table 1. Typical MOS Electrical Parameters (1978). 

The relative resistance values of metal, diffusion, poly, and drain to source paths of 
transistors are quite different. Diffusion and good polysilicon layers have approximately 
one hundred times the resistance per square area of the metal layer. A fully turned on 
transistor has approximately one thousand times the resistance of the diffused and 
polysilicon layers. The capacitances are not as wildly different as the resistances of the 
various layers. Compare the capacitances in Table 1 to the gate to channel capacitance, as a 
reference. The diffused areas typically have one fifth the capacitance per square micron. 
Polysilicon on thick oxide, has approximately one tenth, and the metal layer slightly less 
than one tenth, of the gate-channel capacitance per square micron. 
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The relative values of the resistances and capacitances are not expected to vary dramatically 
as the processes evolve towards smaller dimensions, with the exception of the transistor 
resistance per square, which is independent of a. One note of warning: There is a wide 
range of possible values of polysilicon resistance for different commercial processes. 
Polycrystalline silicon suffers from inordinately high resistances at the crystal grain 
boundaries if the doping level in the polysilicon itself is not held quite high. This disease 
does not affect the diffused layers. For this reason, any processing which tends to degrade 
the doping levels in the diffused and polysilicon layers, affects the polysilicon resistance 
much more dramatically than the resistance of the diffused area. It is in general difficult to 
design circuits which are optimum over the entire range of polysilicon resistivity. If a 
circuit is to be run on a variety of fabrication lines, it is desirable for the circuit to be 
designed in such a way that no appreciable current is drawn through a long thin line of 
polysilicon. In an important example in Chapter 5., polysilicon lines are used as buses along 
which information flows. The timing of these buses can be dramatically affected by the 
resistance of the polysilicon. However, the protocol used on these busses has the polysilicon 
lines precharged during one period of a clock and then pulled low by the appropriate bus 
source during a following clock period. In this way the circuit is guaranteed to work 
independent of the resistance of the poly. However, it may be considerably slower in 
processes of high poly resistivity. 
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Current Limitations in Conductors 

One limit which is not covered in either the design rules or the electrical parameters section 
is that associated with the maximum currents through metal conductors. There is a physical 
process called metal migration whereby a current flux through a metal conductor, exceeding 
a certain limit, causes the metal atoms to move slowly in the direction of the current. If 
there is a small constriction in the metal, the current density will be higher and therefore 
more metal atoms will be carried forward from that point, narrowing the point still more. 
Hence, metal migration is a destructive mechanism causing open circuits in the metal layer 
carrying heavy currents. 

For metals like aluminum this limit is a few times 10 5 amperes per square centimeter. This 
limit does not interfere too drastically with the design of integrated systems in current MOS 
technologies. However, many metal conductors in present integrated systems are operated 
near their current limit, and currents do not scale well as the individual elements are made 
smaller. Applying the scaling rules developed earlier, we found that the power per unit area 
is independent of the scale down ratio. However, the supply voltage decreases and therefore 
the current per unit area increases as the devices are scaled down. For this reason it will not 
be possible to use processes for very large scale integrated systems where the metal thickness 
scales in the same way as other dimensions in the circuit. Much work will likely be done to 
develop processes enabling fabrication of metal lines of greater height relative to width than 
is presently possible. 

Short pulses of current are known to contribute much less to metal migration than steady 
direct current. Nanosecond pulses of currents two orders of magnitude higher than the dc 
limit given above may be carried in metal conductors without apparent damage. Therefore, 
switching current may not be as damaging to metal conductors as a steady current. 

These effects strongly favor processes like CMOS which do not require static dc current, and 
favor design methodologies which maximize system function per unit dc current. 
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A Closer Look at Some Details 

Thus far our discusssion of fabrication has been a general one, adequate for readers whose 
primary interest is in the systems aspects of VLSI. The following sections involve a more 
detailed examination of the capacitance of several important structures and a discussion of 
the relative merits and scaling behavior of several common processes. We suggest that the 
reader just skim through these sections during the first reading of this text. 

In Table 1 we gave typical capacitance for the various layers to the substrate. These 
capacitances are those which would be measured if the voltage on the particular layer were 
zero (relative to the substrate). The dependence upon voltage of the capacitance of the 
different layers may sometimes be important and we will now discuss how this dependence 
arises. References Rl, R2, R4, and Reference 4 of Chapter 1, are good sources for those 
wishing more background information on the concepts of device physics used in this text. 

When a negative voltage is applied to an n-type diffused region relative to the p-type bulk 
silicon, the negative electrons are pushed out of the n-type layer into the bulk and a current 
flows. In integrated systems we are careful to never allow the voltage on the n-type 
diffused regions to be more negative than the p-type bulk. Diffused regions are biased 
positively with respect to the p-type bulk, resulting in a reversed biased p/n junction. With 
the exception of a small leakage current, the reverse biased p/n junction acts merely to 
isolate one diffused region from another. The p-type bulk of our integrated system has a 
small number (typically 10 15 -10 16 per cubic centimeter) of impurity atoms. When a 
voltage is applied to an n-type diffused region, its influence is felt well out into the p-type 
bulk. Positive charge carriers in the p-type bulk are repelled from the positively charged n- 
type layer, thereby exposing negatively charged impurity ions. The region surrounding the 
n-type diffused layer which has been depleted of positive charge carriers is referred to as a 
depletion layer and is showm schematically in Figure 29b. As the voltage on the n-type 
layer is increased, charge carriers are pushed further back from the junction between the n- 
type layer into the p-type bulk, widening the depletion layer and exposing more charged 
impurity ions. The charge thus induced in the depletion layer as the voltage on the n-type 
diffused region is increased is responsible for the capacitance of the n-type diffused region 
relative to the substrate. 

We will now consider a unit area of the junction. The total charge in the depletion layer 
per unit area is proportional to the number per unit volume of impurity ions in the bulk 
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(N), and the width, s Q , of the depletion layer. 

Total charge/area cc Ns 0 

The electric field in the region is proportional to the charge per unit area. 

Electric field cc charge/area cc Ns Q 

The voltage between the n-type diffused layer and the p-type bulk on the far side of the 
depletion layer is proportional to the electric field times thickness of the depletion layer, 
and therefore to the density of negatively charged ions in the depletion layer times the 
square of the width of the depletion layer. 

Voltage electric field X s Q Ns 0 2 

The capacitance per unit area is just the charge per unit area divided by the voltage across 
the depletion layer. From the above equations the capacitance is proportional to the square 
root of the density of impurity atoms in the p-type bulk divided by the voltage. 

Capacitance/area = Q/V cc l/s 0 <x (N/V) 1 ^ 

This relationship is plotted in Figure 30. Notice that the capacitance tends towards infinity 
as the voltage across the junction tends to zero. It would seem that this large capacitance 
would be disastrous for the performance of our integrated systems. However, this proves 
not to be the case. When the p/n junction was formed the n-type region had an excess of 
negative charge carriers while the p-type bulk had an excess of positive charge carriers. 
When the two were brought together to form the junction, there was no voltage to prevent 
charge carriers of either type from flowing over into the opposite region. This initial flow 
caused the n-type layer to become more positive than the p-type layer. This flow ceased 
when just enough voltage built up to stop it. In silicon the voltage required to prevent the 
flow of charge carriers in such a situation is approximately 0.7 volt. Thus the true voltage 
across the junction is this initial "built-in" voltage plus the voltage we apply in our circuit. 
The variation of the capacitance per unit area with applied voltage is shown in Figure 31. 
An approximate equation which can be used to calculate the junction capacitance C: per unit 
area of diffused layers as a function of the applied voltage is given by: 
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Cj = 4.5X10~ 12 [N/(V+0.7)] V2 pf//xm 2 

In this equation, N, the density of impurity ions in the p-type bulk, should be given in 
number per cm^. The voltage is in volts and the capacitance per unit area is evaluated in 
picofarads per square micron. This equation is adequate for most design purposes. 

Aside from the diffused regions, there are two other situations where the capacitance is of 
interest. The first is poly or metal over thick oxide and the second is the gate of an MOS 
transistor. We will discuss poly or metal over oxide first. Figure 32 illustrates once more 
the capacitance per unit area of a junction over the p-type bulk. If the poly or metal layer 
was lain on an oxide much thinner than the depletion layer, its capacitance would be nearly 
the same as that of the corresponding p/n junction. However, if an oxide is interposed 
whose thickness is of the order of the depletion layer thickness, the capacitance of the poly 
or metal line will be decreased. The formula which applies in this case is given by: 

l^total = l^j + 1/C ox 

A typical dependence is shown in Figure 32. For an oxide thickness d, C ox = 3.5XlCT 2 /d, 
where the thickness d is given in angstrom units (10~ 4 microns), and the result is in 
picofarads per square micron as before. 

The most spectacular voltage dependence of a capacitance in the technology we will be using 
is that of the gate of an MOS transistor. When the gate voltage V gs is less than the 
threshold voltage V th> the capacitance of the gate to the bulk is just that given above for 
metal or poly over oxide since the voltage on the gate merely depletes positive charge 
carriers back from the channel area. However, when the voltage on the gate reaches the 
threshold voltage of the transistor, negative charge is brought in under the gate oxide from 
the source of the transistor and the capacitance changes abruptly from the small valve 
associated with depleting charges in the bulk to the much larger oxide capacitance between 
the gate and the channel region. Further increase in voltage on the gate merely increases the 
amount of mobile charge under the gate oxide with no change in the width of the depletion 
layer underneath the channel. Hence, the character of the gate capacitance changes abruptly 
as the gate voltage passes through the threshold voltage. 

The dependence of the total gate capacitance on gate voltage is shown in figure 33. The 
capacitance from channel to bulk is completely separate from the gate to channel 



[Ch.2: Integrated System Fabrication: Sect.2] <Conway>fab2.vlsi January 27, 1978 10:04 AM 


capacitance. It is associated with the depletion layer underneath the channel region, and is 
almost identical to that of a diffused region of the same area. When the gate voltage is 
below threshold, the gate to channel capacitance disappears altogether leaving only the small 
parasitic overlap capacitances between the gate and the source and drain regions. 


Choice of Technology 

Before proceeding to the chapters on system design, let us briefly examine some alternative 
technologies. Using the knowledge developed in these first two chapters, we will discuss the 
reasons for selecting nMOS as the single technology used to illustrate integrated systems in 
this text. Some of the factors which must be considered in choosing a technology include 
circuit density, richness of available circuit functions, performance per unit power, the 
topological properties of circuit interconnection paths, suitablility for total system 
implementation, and general availability of processing facilities. 

As the technology advances, more system modules can be placed on the same sized chip. An 
ultimate goal is the fabrication of large scale systems on single chips of silicon. For this 
goal to be attained, any signal which is required in the system other than inputs, outputs, 
VDD, and GND, must be generated in the technology on the chip. In other words, no 
subsystem can require a different technology for the generation of its internal signals. Thus 
such technologies as magnetic bubbles are ruled out for full integrated systems because they 
are not able to create the signals required for all operations in the on-chip medium. 

We believe that for any silicon technology to implement practical large scale systems, it 
must provide two kinds of transistor. The rationale for this observation is as follows. In 
order to provide some kind of nonlinear threshold phenomenon there must be a transistor 
which is normally off when its control input is at the lowest voltage used in the system. 
Bipolar technologies use NPN transistors for this purpose. The nMOS technology uses n- 
channel enhancement mode devices. In addition to this transistor, a separate type of 
transistor must be supplied to allow the output of a driver device to reach the highest 
voltage in the circuit (VDD). In the bipolar technologies, PNP lateral devices are used to 
supply this function, in the n-channel technology a depletion mode device is used, and in 
complementary MOS technology a p-channel enhancement mode device is used. All three 
choices allow output voltages of drivers to reach VDD and thus meet the above criterion. 
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To date three technologies have emerged which are reasonably high in density and scale to 
submicron dimensions without an explosion in the power per unit area required for their 
operation. These are the n-channel silicon gate process, the complementary MOS silicon 
gate process, and the integrated injection logic, or I Z L, process . Although present forms of 
I L technology lack the additional level of interconnect available in the silicon gate 
technologies, there is no inherent reason that such a level could not be provided. It is 
important to note that increasing the flexibility of interconnect enriches the types of array 
functions which can be created. I^L has the advantage over nMOS that the power per unit 
area (and hence the effective r of its elementary logic functions) can be controlled by an 
off chip voltage. The decision concerning what point on the speed vs power curve to 
operate may thus be postponed until the time of application (or even changed dynamically). 


The nMOS scaling has been described previously. Any technology in which a capacitive 
layer on the surface induces a charge in transit under it to form the current control 
"transistor" will scale in the same way. Examples include Schottky Barrier Gate FET's 
(MESFETs), Junction FET's, and CMOS. 

There are certain MOS processes (VMOS, DMOS) of an intermediate form in which the 
channel length is determined by diffusion profiles. While competitive at present feature 
sizes, these are likely to be interim technologies which will present no particular advantage 
at submicron feature sizes. 

Scaling of the bipolar technology^ is quite different from that of MOS technologies. For 
completeness, we include here a discussion of the scaling of bipolar devices, which may be 
of interest to those familiar with those technologies. 

Traditionally, bipolar circuits have been "fast" because their transit time was determined by 
the narrow base width of the bipolar devices. In the 1950's, technologists learned how to 
form bipolar transistor base regions as the difference between two impurity diffusion 
profiles. This technique allowed very precise control of the distance perpendicular to the 
silicon surface, and therefore permitted the construction of very thin base regions with 
correspondingly short transit times. Since current in a bipolar device flows perpendicular to 
the surface, both the current and the capacitance of such devices are decreased by the same 
factor as the device surface dimensions are scaled down, resulting in no change in time 
performance. The base widths of high performance bipolar devices are already nearly as 
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thin as device physics allows. For this reason, the delay times of bipolar circuits is expected 
to remain approximately constant as their surface dimensions are scaled down. 

The properties of bipolar devices may be analyzed as follows. The collector current is due 
to the diffusion of electrons from emitter to collector. For a minority carrier density N(x) 
varying linearly with distance x, from N 0 at the emitter to zero at the collector (at x=d), the 
current I per unit area A is: 

I/A = q(2D)dN/dx = q(2D)N Q /d = q(2kT/q)^N 0 /d [eq.2] 

where the diffusion constant D = /xkT/q. The factor of two multiplies the diffusion 
constant in eq.2 because high performance bipolar devices operate at high injection level 
(injected minority carrier density much greater than equilibrium majority carrier density). 
The inherent stored charge in the base region is: 


Q/A = N 0 d/2 


[eq.3] 


Therefore, the transit time is: 

r = Q/I = d 2 /[4^ikT/q] [eq.4] 

The form of equation 4 is exactly the same as that for MOS devices (eq.l., chapter 1.), with 
the voltage in the bipolar case being equal to 4kT/q (at room temperature kT/q = 0.025 
volts). A direct comparison of the transit times is shown in Table 2. 



Table 2. 

Transit Time: 



t = (Distance) 2 /(MobilityXVoltage) 



MOSFET: 

MESFET, JFET: 

Bipolar: 

Distance: 

channel length 

channel length 

base width 

Voltage: 

- VDD/2, 

~ VDD/2, 

4kT/q 


many kl/q 

many kf/q 


Mobility: 

surface mobility. 

bulk mobility, 

bulk mobility, 

'y 

cmVv-see, (Si) 

-800 

~ 1300 

-1300 
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At the smallest dimensions to which devices can be scaled, the base width of bipolar devices 
and the channel length of FET devices are limited by the same basic set of physical 
constraints, and are therefore similar in dimension. The voltage on the FET devices must 
be many times kT/q to achieve the required nonlinearity. Hence at ultimately limiting 
small dimensions the two types of device have roughly equivalent transit times. At these 
limiting dimensions, choices between competing technologies will be made primarily on the 
grounds of the topological properties of their interconnects, the functional richness of their 
basic circuits, simplicity of process, and ability to control dc current per unit area. As 
supply voltages are scaled down to the 1 volt range, MOS devices become similar in most 
respects to other FET type devices, and it is possible that mixed forms (MOS-JFET, MOS- 
MESFET, Bipolar-MESFET, etc.) may emerge as the ultimate integrated system technologies. 

We have chosen to illustrate this text with examples drawn from the n-channel silicon gate 
depletion mode load technology. The reasons for this choice in 1978 are quite clear. In 
addition to meeting the required technical criteria we have described, this technology 
provides some important practical advantages to the student and to the teacher. It is the only 
high density technology which has achieved universal acceptance across company and 
product boundaries. Readers wishing to implement integrated system designs may have 
wafers fabricated by essentially any wafer fabrication firm, without fear that slight changes 
in the process or the vagaries of relationships with a particular firm will cut off their source 
of supply. It is also presently the highest density process available. This certainty of access 
to fabrication lines, the more generally widespread knowledge of nMOS technology among 
members of the technical community, its density, and its performance similarity with 
bipolar technology in its ultimate scaling, are all important factors supporting its choice for 
this text on VLSI Systems. However, the principles and techniques developed in this text 
can be applied to essentially any technology. 
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Notation-Two Phase Clocks-The Shift Register-Relating Different Levels of 

Abstraction-Implementing Dynamic Registers-Implementing a Stack-Register 

to Register Transfer-Combinational Logic-The Programmable Logic Array- 

Finite State Machines - - - towards a Structured Design Methodology 

The process of designing a large-scale integrated system is sufficiently complex that only by 
adopting some type of regular, structured design methodology can one have hope that the 
resulting system will function correctly, and not require a large number of redesign 
iterations. However, the methodology used should allow the designer to take full advantage 
of the architectural possibilities offered by the underlying technology. 

In this chapter we present a number of examples of data and control flow in regularized 
structures, and discuss the way in which these structures can be assembled into larger groups 
to form subsystems, and then these subsystems assembled to form the overall system. The 
design methodology suggested in this chapter is but one of many ways in which integrated 
system design may be structured. The particular circuit form presented does tend to produce 
systems of very simple and regular interconnection topology, and thus tends to minimize the 
areas required to implement system functions. Arrays of pass transistor logic in register to 
register transfer paths are used wherever possible to implement system functions. This 
approach tends to minimize power dissipated per unit area, and, with level restoration at 
appropriate intervals, tends to minimize the time delay per function. The methodology 
developed is applied in later chapters to the architecture and design of a data processing 
path and its controller, which together form a microprogrammed digital computer. 

Computer architects, who usually design systems in a rather structured way using 
commercially available MSI and LSI circuit modules, are often surprised to discover how 
unstructured is the design within those modules. In principle one can use the basic NAND 
and NOR logic gates described in Chapter 1 to implement combinational logic, build latches 
from these gates to implement data storage registers, and then proceed to design integrated 
systems using traditional logic design methodology as applied to discrete devices. Integrated 
systems are often designed this way at the present time. However, it is unlikely that such 
unstructured approaches to system design can survive as the technology scales down towards 
maximum density VLSI. 
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There are historical reasons for the extensive use of random logic within integrated systems. 
The first microprocessors produced by the semiconductor industry were fairly direct 
mappings of early generation central processor architectures into LSI. A block diagram of 
the Intel 4004, the earliest microprocessor to see widespread commercial application, is 
illustrated in figure la. The actual chip layout of the 4004 shown in Figure lb indicates the 
complexity of the LSI implementation of this simple central processing unit. Such LSI 
systems, directly mapping data paths and control functions appropriate in earlier component 
technologies, of necessity contained a great deal of random logic. However, the extensive 
use of random logic results in chip designs of very great geometrical and topological 
complexity, relative to their logical processing power. 

To deal with such complexity, system design groups have often stratified the design problem 
into architecture, logic design, circuit design, and finally circuit layout, with specialists 
performing each of these levels of the design. Such stratification often precludes important 
simplifications in the realization of system functions. 

Switching theory provides formal methods for minimizing the number of gates required to 
implement logic functions. Unfortunately, such methods are of little value in VLSI systems, 
since the area occupied on the silicon surface by circuitry is far more a function of the 
topological properties of the circuit interconnections than it is of the number of logic gates 
implemented. The minimum gate implementation of a function often requires much more 
surface area for its layout than does an alternative design using more transistors but having 
simpler interconnection topology. 

There are known ways of structuring integrated circuit designs implemented using 
traditional logic design methods. A notable example is the polycell 1 technique. In this 
technique, a group of standard cells corresponding to typical SSI or MSI functions are 
gathered into a library of functions. The logic diagram for the system to be implemented is 
used to specify which cells in the library are required. The cells are then placed into a chip 
layout, and interconnections laid out between them by an automatic interconection routing 
system. The polycell technique provides the logic designer having limited knowledge of 
integrated systems with a means of implementing modest integrated circuit designs directly 
from logic equations. However, a heavy penalty is paid in area, power, and delay time. 
Such techniques, while valuable expedients, do not lake advantage of the true architectural 
potential of the technology, and do not provide insight into directions for further progress. 
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Switching theory not only yields the minimum number of gates to implement a logic 
function, but it also directly synthesizes the logic circuit design. Unfortunately, at the 
present time there is no general theory which provides us with a lower bound on area, 
power, and delay time for the implementation of logic functions in integrated systems. 
Theoretical lower bounds for certain special structures and algorithms of interest are given 
in chapter 9. 

In the absence of a formal theory, we can at best develop and illustrate alternative design 
methodologies which tend to minimize these physical parameters. Proposed design 
methodologies should in addition provide means of structuring system designs so as to 
constrain complexity as circuit density increases. We hope that the examples and techniques 
presented in this text will serve to clarify these issues and stimulate others to join in the 
search for more definitive results . 


Notation 

There are a number of different levels of symbolic representation for MOS circuits and 
subsystems used in this text. Figures 2a., 2b., 2c., and 2d., illustrate a NAND gate at several 
such levels. At times it may be necessary to show all the details of a circuit's layout 
geometry in order to make some particular point. For example, a clever variation in some 
detail of a circuit's layout geometry may lead to a significant compaction of the circuit's 
area without violating the design rules. 

Often, however, a diagram of just the topology of the circuit conveys almost as much 
information as a detailed layout. Such stick diagrams may be annotated with important 
circuit parameters if needed, such as the L/W ratios shown in figure 2b. Many of the 
important architectural parameters of circuits and subsystems are a reflection of their 
interconnection topologies. 

Alternative topologies often lead to very different layout ureas after compaction. The 
discovery of a clever starting topology for a design usually provides far better results than 
does the application of brute force to the compression of final layout geometries. For this 
reason, main of the important structural concepts in this chapter and throughout the text 
will be represented for clarity by use of colored stick diagrams. The color coding of the 
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Fig.2c. NAND Gate: Circuit Diagram 
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stick diagrams is the same as that of layout geometries, and is as follows: green symbolizes 
diffusion', yellow symbolizes ion implantation for depletion mode transistors; red 
symbolizes poly silicon-, blue symbolizes metal ; black symbolizes a contact. 

Later, through a number of examples in chapter 4, we will present the details of procedures 
by which the stick diagrams are transformed into circuit layouts, and then digitized for 
maskmaking. Note that if this topological form of representation were formalized, one 
might consider "compiling" such descriptions by implementing algorithms which "flesh out 
and compress" the stick diagrams into the final layout geometries^, according to the 
constraints imposed by the design rules. 

When the details of neither geometry nor topology are needed in the representation, we may 
revert to the familiar circuit diagrams and logic symbols. At times we may find it 
convenient to mix several levels in one diagram, as shown in figure 2e. A commonly used 
mixture is: (i) stick diagrams in portions where topological properties are to be illustrated, 
(ii) circuit symbols for pull tips, and (iii) logic symbols, or defined higher level symbols, for 
the remaining portions of the circuit or system. 

We will define logic variables in such a way that a high voltage on a signal path 
representing that variable corresponds to that variable being true (logic-1). Conversely, a 
low voltage on a signal path representing that logic variable corresponds to the variable 
being false (logic-0). Here high voltage and low voltage mean well above and well below 
the logic threshold of any logic gates into which the signal is an input. This convention 
simplifies certain discussions of logic variables and the voltages on the signal paths 
representing them. Thus when w-e refer to the logic variable p being high, we indicate 
simultaneously that ft is true (logic-1) and is represented on the signal path named fi by a 
high voltage, one well above the logic threshold. In boolean equations and logic truth tables 
we use the common notation of 1 and 0 to represent true and false respectively, and by 
implication high and low voltages on corresponding signal paths. 
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Two Phase Clocks 

We will often make use of a particular form of "clocking” scheme to control the movement 
of data through MOS circuit and subsystem structures. By clocking scheme we mean a 
strategy for defining the times during which data is allowed to move into and through 
successive processing stages in a system, and for defining the intervening times during which 
the stages are isolated from one another. Many alternative clocking schemes are possible, 
and a variety are in current use in different integrated systems^. The clocking scheme used 
in an integrated system is closely coupled with the basic circuit and subsystem structuring, 
and has major architectural implications. For clarity and simplicity we have selected one 
clocking scheme, namely two-phase, non-overlapping clock signals. This scheme is used 
consistently throughout the text, and is well matched to the type of basic structures possible 
in MOS technology. 

The two clock signals <p 1 and <p 2 are plotted as a function of time in figure 3. The signals 
both switch between zero volts (logic-0) and a voltage near VDD (logic-1), and both have 
the same period, T. Note that both signals are non-symmetric, and have non-overlapping 
high times. The high times are somewhat shorter than the low times. Thus <p 2 is low all 
during each of those time intervals from when rises, nears VDD, and then falls back to 
zero. Symbolizing the rise of a signal qp as tqp, and the fall as T qp, we also have a similar 
rule for qp^, namely <P] = 0 all during each time interval from tqp 2 to T<p 2 - Therefore, at 
all times the logic and of the two signals equals zero: [<Pi(t)] • [qD 2 ( 0 ] = 0* f° r a11 For 
convenience, we will often use the following equivalence in our descriptions: "during the 
time period when <pj is high " = " during <p j". In the next section we will illustrate the use 
of these two clocking signals to move data through some simple MOS circuit structures. 
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The Shift Register 

Perhaps the mosl basic structure for movement of a sequence of data bits is the serial shift 
register, shown in circuit diagram form in figure 4a, The shift register is composed of level 
restoring inverters coupled by pass transistors, with the movement of data controlled by 
applying clock signals and qo 2 to the gates of alternate pass transistors in the sequence. 

Data is shifted from left to right as follows. Suppose a logic signal X is present on the 
leftmost input to the shift regiter when clock signal qp^ rises. Then, during the time when 
is high, this signal will propagate through the pass transistor and be stored as charge on 
the input capacitance of the first inverter stage. For example, if the signal X is low, then 
the inverter input gate capacitance will be discharged towards zero volts during the time 
when qp^ is high. On the other hand, if X is high, the inverter input capacitance will be 
charged up towards VDD - during qpp. 

When the clock signal <p^ falls, the pass transistor becomes an open circuit, isolating the 
charge on the input of the inverter. The second clock phase is now initiated by the rise of 
qc 2 . During the time interval when qp 2 is high the logic signal X, now inverted, will flow 
through the second pass transistor onto the gate of the second inverter. This pattern can be 
repeated an arbitrary number of times to produce a shift register of any length. 

Note that since the clock signals do not overlap, the successive pairs of stages of the shift 
register are effectively isolated from one another during the transfer of data between 
inverter pairs. For example, when qp^ is low, and qp 2 is high, all adjacent inverters connected 
by the <p 2 controlled pass transistors are in the process of transferring data from the left to 
the right members of the pairs. All these pairs of inverters are isolated from each other by 
the intervening cp^ controlled pass transistors which are all open circuits when qpj is low. 

It is also important to note that the shortest period, T, we can use for clocks controlling 
such data transfers is determined by the time required to adequately charge or discharge the 
inverter input gate capacitance through the pass transistor and the preceding stage pullup or 
pulldown. To this time must then be added an increment of time sufficient to insure that 
the clocks do not overlap. For more complex systems, the minimum clock period may be 
estimated as a function of basic circuit parameters as discussed in Chapter 1. 


Figures 4b and 4c illustrate the serial shift register using mixed notations. In figure 4b, each 
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inverter circuit diagram has been replaced by its logic symbol. In figure 4c, the pass 
transistor circuit symbols have been replaced by their stick diagrams. When visualizing the 
inverter, as represented by its logic symbol, in a circuit structure containing mainly stick 
diagrams, two points should be kept in mind: 

(i) The input to the inverter leads directly to the gate, and thus the gate capacitance, of the 
inverter's pulldown transistor. This input may be used to store a data bit by isolating the 
charge representing the bit with a pass transistor. Note that the input path will end up on 
the poly level within the inverter. A contact cut may thus be required to connect the poly 
gate and the metal or diffusion path on which the signal enters the inverter. 

(ii) Since the connection between the source and gate of the inverter pullup transistor 
requires a connection of all three conducting levels, the inverter output signal may easily be 
routed out on any one of the three levels. 

Identical serial shift registers can be stacked next to each other and used to move a sequence 
of data words, as shown in figure 5a. The simple structure in figure 5a anticipates the 
elegant topological simplicity of many important MOS integrated system functions. By 
connecting the successive inverter stages with diffusion paths, the pass transistors controlled 
by the clock signals are formed by simply running vertical clock lines in poly. The structure 
in figure 5a also anticipates another important point: topological simplification often results 
when control signals flow on lines that are at right angles to the direction of data flow. In 
this way as many bits as necessary can be processed in parallel with the same control signals. 

The example in figure 5a is so rudimentary it is perhaps difficult to visualize the two clock 
signals as actually containing control information. Let us consider a slightly more complex 
example, the shift-up register array shown in figure 5b. In this structure, each data bit 
moving from left to right during <p 2 has two alternative pass transistor paths through which 
it can proceed to the next stage: a straight through path, and a path which shifts it up to the 
next higher row. If the control signal SH is low, then [<p 2 * SH'] is high, and the straight 
through pass transistor paths are used during <p 2 . At the same time, [<p 2 • SH] is low, thus 
preventing data flow through the shift-up pass transistor paths. On the other hand, if SH is 
high, the straight through pass transistors are off and the shift-up pass transistor paths are 
used during <presulting in the entire data word being shifted vertically as well as 
horizontal!). Here the vertical control lines are run m metal, and the pass transistors are 
selectively formed by crossing the appropriate diffusion paths with short poly lines. 
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Fig.4c. Shift Register: More Mixed Notation 



Fig.5b. Shift-Up Register Array 
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Relating Different Levels of Abstraction 

In the discussions in this chapter, we will not have to make extensive calculations of the 
detailed electrical behavior of the devices and circuits involved in order to analyze the 
general behavior of digital logic constructed with these devices and circuits. Most of the 
examples presented in this chapter, and throughout the text, build upon the use of pass 
transistors coupling inverting logic stages as a means of structuring designs. The general 
results of chapter one provide the solutions to most device and circuit problems 
encountered, such as ratio and delay calculations, etc. In most cases, design concepts can be 
worked out using stick diagrams, and only at the stage of transforming the circuit topology 
into the detailed circuit layout geometry will these calculations need to be worked out, either 
by hand or with circuit simulation programs. 

It is important to simplify our mental model of integrated circuitry, so as to more quickly 
and easily analyze or explain to others the function of a given circuit, and more easily 
visualize and invent new circuit structures without drifting too far away from physically 
realizable and workable solutions. Of course, it is in general a dangerous practice to 
oversimplify our abstractions of electronic circuit behavior, and there are some nMOS 
circuits of deceptively simple appearance which have exceedingly complex behavior. 
However, throughout large portions of digital integrated systems, if the circuit and 
subsystem design is carefully structured as suggested in this text, an extremely simple mental 
model of device and circuit behavior will prove adequate to predict circuit and subsystem 
behavior. 

Figure 6a illustrates a simple way of visualizing the operation of successive inverting logic 
stages coupled by pass transistors. Assume for the moment that any pass transistors in the 
paths between stages are on. To visualize the time behavior of an inverter, and the effect of 
the pullup L/W to pulldown L/W ratio, imagine the flow' of current from VDD to GND as 
the flow of a fluid, and the inverter's two transistors as valves. The pullup transistor is 
always on, and so the upper "valve" is always open. However, the "valve" corresponding to 
the pulldown transistor may be either open or closed, depending on the amount of charge on 
its gate. 

In figure 6a. the input to inverter-A is a logic-0, so the pulldown of inverter-A is off, and 
the lower valve is closed. Current is thus diverted to the large charge storage site 
corresponding to the gale of the pulldown of inverier-B. If sufficient charge has flowed 




Fig. 6b. Successive Inverter Stages: Circuit Diagram and Logic Diagram 
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Fig. 6c. Successive Inverter Stages Connected Through a Pass Transistor 

[ Illustrating effect of the pass transistor ’’switch"] 


(dcf 6. press) 








[Ch.3: Data and Control Flow in Systematic Structures: Sect. 11 <Conway>dcfl.vlsi January 27, 1978 6:14 PM 


9 


onto this gate, corresponding to a high level of fluid in the tank representing the gate 
capacitance, then the pulldown of inverter-B is turned on, and thus the lower valve of 
inverter-B is open. If the lower valve in inverter-B is much larger than the upper one, 
corresponding to a practical ptillup to pulldown size ratio, then the pulldown of inverter-B 
can sink all the source current provided by the pullup. In addition, if given sufficient time 
and if the connecting pass transistor is on, the pulldown can drain off any charge stored on 
the succeeding inverter's input gate. Thus we can visualize the sequence of inversions of a 
logic signal propagating through successive inverter stages as an alternation between high 
and low levels of fluid in the storage tanks. In addition, we can visualize some of the time 
behavior of the signal's propagation: the larger the gate capacitances (storage tanks), the 
longer it takes to build up enough charge to open the next stage, and similarly the longer it 
takes to drain charge off the next stage and thus turn it off. 

Figure 6b represents the same physical circuit modelled in figure 6a, but on successively 
higher levels of abstraction. When analyzing circuit or logic diagrams showing successive 
inverting logic stages, as in figure 6b, one should keep the model of figure 6a in mind. 
Whether one is a novice or an expert in integrated system design, it is very helpful to 
compress the details of any given lower level of abstraction, so as to reduce the complexity 
of the problems presented at the next higher level, and enable the mind to span problems of 
larger scope. 

We are now' able to visualize a very simple model for the pass transistor: it is in fact like a 
valve, or "switch" in the path between an inverter and the next charge storage site, i.e. the 
input gate of the next inverter. Figure 6c shows two inverters coupled by a pass transistor, 
with the pass transistor informally symbolized as a "switch". In the upper diagram of figure 
6c, the pass transistor input is a logic-1, and so the "switch" is in the on position, resulting 
in the output Z being equal to the input X, after a suitable delay time At. Thus during the 
time the pass transistor gate input P = 1, the output Z(t) = X(t - At). Here At is some 
multiple of the transit time, t, of inverter pulldown transistor, as discussed in Chapter 1. 

In the low'er diagram of figure 6c. the pass transistor "sw'itch" is moved to the off position 
since P is a logic-0. Therefore, according to our model, the valve in the path is shut, and 
the charge, or lack of charge, is isolated in the storage site. Thus once the pass transistor 
valve is shut, / remains at a constant value, independent of changes in X: i.e., if P -*■ 0, at 
t = t () , then Z(t) = Z(t 0 ), for t > t Q . 
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These simple visualizations of the inverter and the pass transistor will carry us fairly far 
into LSI subsystem design. Several logic circuits in this chapter are drawn first in stick 
diagram form, and then informally sketched with pass transistors replaced with "switches", 
both to clarify the behavior of the circuits involved, and to further demonstrate the 
applicability of the model. 


Implementing Dynamic Registers 

Registers for the storage of data play a key role in digital system design. It is interesting to 
note that a group of adjacent inverters, with their gates isolatable by pass transistors, can be 
considered a form of temporary storage register. This arrangement is illustrated in figure 7, 
which shows two levels of symbolism for this dynamic register. Such a register is very 
simple in structure. It consists of only three transistors per bit position: the pass transistor 
and the two transistors of the inverter. However, this dynamic form of register will 
preserve data only as long as charge can be retained on the inverter input gates. Typically 
dynamic registers are used in situations where the input gate updating control signals are 
applied frequently. They are ideal in a clocked system in which they are reloaded every 
clock cycle, as in the shift register. 

Suppose we wish to construct a simple register which can be loaded during the appropriate 
clock phase under the control of a load signal, and which will retain its information 
through an indefinite number of successive clock periods until it is reloaded using the load 
signal. A one bit cell for such a register may be constructed using cross coupled inverters in 
the configuration shown in figure 8. This register cell is still dynamic in form, since it uses 
charge storage on the gate of the first inverter to preserve its state. However, it need not be 
loaded on every successive as was the simple register in figure 7. The pass transistor 
leading to it from the preceding stage is switched on only when both <p^ and LD are high. 
On any following <p^ when LD is low, the cell updates itself by the feedback path through 
the second pass transistor. Figure 9 illustrates a selectively loadable register composed of 
such cells. One important feature of this type of register is that it provides as output both 
the true and complemented forms of the stored data. This feature is often useful when the 
data is to be processed by a following network of combinational logic. 

While there are more elaborate forms of dynamic and static registers, the above two forms 
are sufficient for mam of the required data storage applications within integrated systems. 
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Fig. 7. A Dynamic Register 



Fig. 8. A Selectively Loadable Dynamic Register Cell 
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Fig. 9. A Selectively Loadable Dynamic Register 
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Implementing a Stack 

The ideas used to construct simple dynamic registers in the preceding section may be applied 
to the construction of more sophisticated and interesting subsystems. In this section we will 
describe the implementation of a stack. This type of subsystem is commonly called a last- 
in, first-out stack (LIFO), or pushdown stack, although we will diagram it horizontally 
rather than vertically. It is a shift register array with three basic operations: during each 
full clock period (1) we can push in a new data word at one end of the array, pushing all 
previously entered words one word position further into the array, or (2) we can leave all 
words in their current position, or (3) we can pop out a word from the end of the array, 
pulling all previously entered words back out by one word position. 

Figure 10a shows the structure of one horizontal row' of the stack. Here we have 
implemented a shift register which can perform the following three operations: shift data 
left to right, hold data in place, or shift data right to left. There are four control signals 
used, two of them being active during <p^ and two of them being active during qpj- The 
signals <pj and qp ^ are our familiar two phase, non-overlapping clock signals. 

In order for data to be shifted from left to right, the shift right control line (SHR) is driven 
high during qp|, followed by driving the transfer right control line (TRR) high during <p 2 - 
The bit of data appearing at the left is thus transferred by this operation onto the gate of 
the first inverter during (p^, and thence to the gate of the second inverter during <p^ In 
order for data to be held in place, the signal transfer left (TRL) is driven high during (pj 
and transfer right (TRR) is driven high during <p 2 , causing the data to recirculate upon itself 
without shifting. Note that the data can be obtained at any time from the output of the 
first inverter. However, since new data may come to the gate of the first inverter during <p^, 
the only safe time to take data out to the left is during <p 2 - The transfer of data from right 
to left is caused by driving the shift left control (SHL) line high during <p 2 * followed by 
driving transfer left (TRL) high during <pj. 

Figure 10b illustrates a possible topological structure of one horizontal row of the stack. 
There are two horizontal pathways on the diffusion level for shifting bits right or left. The 
two inverters for one stage of the row are nested between these paths. VDD, GND, and the 
four control lines run vertically in metal. The four pass transistors required for controlling 
the movement of data are conveniently implemented by short poly lines which cross the 
horizontal diffusion tracks at appropriate positions. Note that the entire row is composed 
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Fig.lOa. One Horizontal Row of the Stack 



Fig.10b. Topology of One Horizontal Stack Row 
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of 180° rotations and repetitions of a basic cell containing one inverter. 

In a typical implementation of the complete LIFO stack, a number of such rows run parallel 
to each other in the horizontal direction. The number of rows is equal to the width in bits 
of the data words involved. The control lines run vertically across the entire stack, 
perpendicular to the direction of data flow. For data words of any substantial width, the 
capacitive loading on the control signals would be sufficient to warrant use of super-buffer 
drivers. 

The stack as a whole may be controlled with only two logic signals: one signalling push , and 
the other signalling pop. The activation of neither of these two signals causes data to 
recirculate in place, awaiting the next active instruction. 

Let us consider how to derive, from push and pop, the control signals for driving the four 
control lines SF1R, TRR, SHL, TRL. A possible scheme is shown in Fig. 10c. We use 
random logic for this purpose since only a few gates are required to control the large, 
regular array of circuit ceils in the stack. The operation which determines what the stack 
will do during the subsequent clock phase is brought in on the path labeled OP. It is 
important to note in the following that only one signal path (OP) is required to bring in 
both push and pop logic signals, since these are active on mutually exclusive clock phases. 

The control scheme is summarized in the timing diagrams in figure lOe. Flere we see that 
holding OP high during <p 2 , followed by low during tpj, implements push. Holding OP low 
during both <p L and <p 2 causes the data to recirculate in place. Holding OP high during <p lf 
followed by low during <p 2 , implements pop. Thus, the single signal path, OP, is sufficient 
to carry both stack control signals into the stack. 

During (pj, the OP signal is fed through the upper pass transistor into the inputs of the two 
NOR gates g} and g 2 - The outputs for these two NOR gates are low during this period, 
since <pj is high. 

If the incoming OP signal is high while <pj is high, then the lower input of NOR gate g 2 
will be low. Thus when <pj falls low, the output of g 2 will go high, thereby driving SH I 
high. If the OP signal is instead kept low while tpj is high, then the output of the NOR gate 
g, will go high on the fall of tpj, thereby driving TRR. high. 




Fig. lOd. Stack Geometry and Interconnect Topology 
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Fig. lOe. Stack control Signal Timing Diagrams 
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During ihe period when <p2 is high and either the shift left (SHL) or the transfer right 
(TRR) operation is being executed, the signal on the OP line is being stored on the 
corresponding input gates of the lower two NOR gates, and g 4 . Thus, if OP is high while 
(P2 is high, a logic -0 is stored on the input of the NOR gate g 4 , and during the subsequent 
high period, SHR will be driven high. Conversely, if OP is low while (p 2 high, TRL 
will be driven high during the following <p^ high period. 

This kind of control scheme recognizes that there must be a lull period between any 
operation and its next occurrence. Control information is taken in during this period and 
set up for the subsequent operation. The scheme takes advantage of these lull periods, when 
possible, to perform other operations which can be done without conflict. It is an example 
of a fundamental design technique which can be extended to larger system structures. 

When planning the overall architecture of a larger system, it is often useful to represent 
subsystems, such as the stack, using a higher level of symbolism. To be truly useful, such 
representations should, in addition to a functional definition, include the topological factors 
associated with the interconnection points of the subsystem and the geometrical factors of 
its shape and relative physical dimensions. 

A system level sketch of one particular implementation of the stack is shown in figure lOd. 
Identical driver circuitry is placed along the top and bottom edges of the shift register array. 
The transfer right and shift left drivers which are set up during ( anc * act -i ve during <p 2 ) 
are placed along the top of the shift register array. The transfer left and shift right drivers 
which are set up during qp2 ( and active during <pj) are placed along the bottom of the array. 
This choice has made the two tinting signals local to the drivers which they control. The OP 
bit is required on both the top and the bottom of the shift register array. Equivalently, push 
is required only on the bottom of the array, while pop is required only on the top. 

The integration of this subsystem into a larger integrated system design will require that the 
data in and out paths be matched to those of subsystems to which the array is connected, 
and that the <p lt qp 2 , and OP signals be available at either the left or right side of the array. 
By using system level representations that reflect as closely as possible the dimensions and 
locations of critical signals in all major subsystems, the interactions between topologies and 
dimensions of the subsystems can be assessed. The feasibility of an overall system 
architecture can thus be ensured prior to detailed design and layout. 
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Register to Register Transfer 

From an implementation point of view it is often desirable to combine logic steering 
functions with the clocking of data into registers, since both require pass transistors as their 
elementary functional unit. An example is the shift-up register array shown in figure 6. 
From the next higher level system view, however, it is desirable to separate the two 
functions conceptually. In Fig. 11a we have shown some combination of inputs, X0 through 
Xn going through some combination of pass transistors, which may or may not have logic 
functions attached, into the input gates of some inverting logic elements. This combination 
of function is then abstracted into a register clocked on the phase during which the input 
pass transistors are turned on. Any logic function associated with the input pass transistors 
is considered part of the preceding combinational logic module. This viewpoint is an 
extension of the concept of dynamic register previously developed in figure 7. 

Using this notation, any processing function can be built up using blocks of the form shown 
in Fig. lib. Here we have a clocked input register, a block of strictly combinational logic 
with no timing attached , and an output register clocked on the opposite phase. In this case 
the inputs are stored in the input register during <pp They then propagate into and through 
the combinational logic (C/L), with the resulting outputs stored in the output register 
during yj- Any single data processing step can be viewed as a transfer from one such 
register to a second through a combinational logic block. 

A sequence of such operations can be performed on a data stream by a series of such 
combinational blocks separated by registers as shown in Fig. 11c. Since different sets of 
data words in the stream may be operated upon at the same time, but at different locations, 
this data path is a type of pipelined processing structure. Such pipelined processing 
structures offer the opportunity for improved processing bandwidth by performing many 
different operations concurrently. Notice that the throughput rate of such a pipeline system 
of register to register transfer operations is limited by the delay time through the slowest of 
the combinational logic blocks. If no registers had been interposed between the function 
blocks, and each operand set separately run through the entire sequence of combinational 
logic modules, the throughput rate would be much lower. 

In line with the ideas developed earlier in this chapter, the detailed functions performed by 
the combinational logic modules may often be implemented in circuit structures of very 
simple and regular topology. Control signals will in general cross the data path at right 




Fig. 11c. General Form for a Data Path 
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angles to the direction of data flow. Figure 11c illustrates sets of such control inputs as 
lines carrying the control function OP^ into the first C/L module. 112 lines carrying OP 2 
into the second, etc. 

The idea of data being processed while passing through combinational logic interspersed 
between register stages in a sequence of register to register transfers is a basic and 
important concept in the hierarchy of digital system architecture. We have already 
described the implementation of registers. The next sections will describe some ways to 
implement combinational logic functions. 


Combinational Logic 

Combinational logic modules contain no data storage elements. The outputs of a 
combinational logic module are functions only of the inputs to that module, provided that 
sufficient time has been allowed for those inputs to propagate through the module's 
circuitry. 

In integrated systems, combinational logic design problems will typically fall within one of 
three general classes. The first is w'hen a small amount of simple logic is required, for 
example to derive control signals at the periphery of a system module (as in the stack 
control signal generation) or to implement a simple function within a single circuit cell 
(which may then be replicated in a regular array). In these cases, traditional logic design 
procedures using static nand and nor gates can be applied. Such designs involving a few 
gates are usually rather simple, and can be produced by inspection rather than by use of 
formal minimization and synthesis procedures. Even in these simple cases, the minimum 
static logic gate implementation does not necessarily result in either the most regular, the 
minimum area, the minimum delay, or the minimum power design. In fact, we often find 
alternative techniques to the use of static logic gates, which in specific instances lead to 
"better" designs by one of these measures than would minimum gate implementations. For 
example, figure 12a shows a selector logic circuit (1. Sutherland), in which one of the inputs 
Sf, S 2 , S 3 , S 4 is selected for output by the control variables A, and B according to the 
function: 


7 = SjA'IV + S 2 AB' + S 3 A’B + S 4 AB 
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This selector circuit is composed simply of poly paths crossing diffusion paths. Where 
depletion mode transistors are placed, the diffusion level path is always connected, thus 
placing control in the selectively located enhancement mode pass transistors, which function 
as simple switches. Figure 12c shows the circuit's paths from inputs to outputs using the 
"switch" abstraction for each of the pass transistors. For each possible combination of 
values of A and B, there is a path through the selector to Z from only one of the inputs Sj. 
For the specific inputs shown in the example in figure 12c, the signal $2 propagates through 
to Z since both A and B' are high. Note that no static power is consumed by the circuit, 
and the area occupied by the circuit is minimal since no contact cuts are required within it. 
In chapter 5 we describe a very general and powerful arithmetic logic unit (ALU) which 
uses an array of such selector blocks to control a pass transistor carry network. 

The second general class of combinational logic design problems are those rather complex 
functions for which clever ways of structuring topologically regular implementations have 
been discovered. As an example, consider the implementation of a (ally function. This 
function has n inputs and n + 1 outputs. The k lh output is to be high, and all other outputs 
low, if k of the inputs are high. The boolean equations representing this function for the 
simple case of three inputs are: 

z 0 = xyxyxy 

Zi ■ *i>W ♦ x 1 'x 2 x 3 ' ♦ x 1 -x 2 'x 3 

Z2 = X1X 2X 3’ + X jX2*X3 + X^'X2X^ 
z 3 = XjX 2 X 3 

If this function were designed with random logic consisting of active pullup, static logic 
gates, it would result in a topological kludge. Figure 12b shows a topologically regular 
implementation of the tally function. A major portion of the function is implemented 
using a regular array of identical cells each containing only two pass transistors. The design 
is based on the shift-up register idea presented earlier. A high signal propagates through the 
array from the pullup at the lower left. Whenever one of the variables Xj is high, the 
propagating high signal moves up to the next higher horizontal diffusion level path. Thus 
the number of paths it moves up equals the number of inputs Xj which are high. Logic-0 
signals propagate through the array from the ground points to all other outputs. 
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Figure 12d shows the paths from inputs to outputs for this tally circuit, using the "switch" 
abstraction for the pass transistors. The figure shows a specific example of a set of inputs 
controlling the pass transistors of the circuit. Since two of the inputs are high, the logic-1 
signal is shifted up two rows and emerges at Zj. 

This tally function design can be easily expanded to handle more than three inputs by 
simply extending the array structure upwards and to the right. However, remember that the 

■y 

delay through n pass transistors is proportional to n . Thus it may be necessary to insert 
level restoration prior to such extension. Similar comments apply to the extension of the 
selector circuit previously shown, or other pass transistor logic arrays one might invent. 

The electronic logic gates traditionally used in digital design are unilateral elements: they 
allow a logic signal to propagate in one direction only. It should be noted that the pass 
transistor is a bilateral circuit element. It permits the flow' of current, and thus the passage 
of a logic signal, in either direction w'hen its gate is high. While this property of the pass 
transistor is not necessarily of fundamental importance in integrated systems, it is an 
interesting and occasionally useful one. 

Early relay switching logic used switching contacts which were bilateral elements. 
Interesting discussions of relay switching logic are contained in both references R4 and R5. 
The tally array example just given is a basic symmetric network mapped directly into nMOS 
from relay switching logic (see R5, p.241). The mathematics of switching universally used 
in digital systems today was proposed by Claude Shannon (R7) in 1938. Shannon 
demonstrated that the calculus of propositions, based on the algebra of logic developed by 
Boole (R8), was directly applicable to relay switching circuits. 

The third and final combinational logic design situation occurs when a complex function 
must be implemented for which no direct mapping into a regular structure is known. 
Methods for handling this situation are the subject of the next section. 

In the design methodology developed in this text, the combinational logic between stages in 
the register to register transfer paths is often done by operations on the charge moving 
between stages, using pass transistors to perform these operations. Many researchers at the 
present time are searching for alternative structures and lechnicjues foi perfoiming these 
elementary functions, including the use of charge transfer devices^. 
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I'he Programmable Logic Array 

On many occassions it is convenient to implement the combinational logic interspersed 
between register stages with regular structures of pass transistors. However, we will often 
encounter important combinational logic functions which do not map well into such regular 
structures. In particular, combinational logic used in the feedback paths of finite state 
machines is often highly complex and inherently irregular. Also, we may wish to delay 
binding the details of the logic functions used in finite state machine sequencing until most 
of the design is complete. If the combinational logic were implemented in an irregular 
structure, such changes could require a major redesign. 

Fortunately, there is a way to map irregular combinational functions onto regular structures, 
using programmable logic arrays (PLA’s) as described in this section. This technique of 
implementing combinational functions has a great advantage: functions may be significantly 
changed without requiring any major design or layout changes of the PLA structure. 

One very general and regular way to implement a combinatorial logic function of n-inputs 
and m-outputs is to use a memory of 2 n words of rn-bits each. The n-inputs form an 
address into the memory, and the m-outputs are the data contained in that address. Such a 
memory implements the full truth table for the output functions. Many systems are in fact 
built using memories as combinational logic elements. A common form of memory for this 
purpose is the read-only memory (ROM) where the data is permanently placed in the 
memory by a mask pattern, or by electrically altering the individual bit positions. There is 
one major difficulty with this approach: it is often the case that most of the possible input 
combinations cannot occur, due to the nature of the specific problem. Staled another way, 
many combinational logic functions require only a small fraction of all 2 n product 
minterms for a canonical sum of products implementation. In such cases, a ROM is very 
wasteful of area. 

The programmable logic array (PLA) is a structure which has all the generality of a 
memory for implementing combinational logic functions. However, any specific PLA 
structure need contain a row of circuit elements only for each of those product terms that 
are actually required to implement a given logic function (the essential prime intplicanis; 
R4. Ch.4). Since it does not contain entries for all possible mimerms, it is usually far more 
compact than a ROM implementation of the same function. To achieve full compaction. 
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the various output functions must be jointly minimized before the PLA layout pattern can 
be defined. However, such minimization is not essential. Less than full compaction 
increases the independance of the different entries, so that changes in function may require 
only local changes in the PLA. 

An illustration of the overall structure of a PLA is shown in figure 13a. The diagram 
includes the input and output registers, in order to show how easily these are integrated into 
the PLA design. The inputs, stored during in the input register, are run vertically 
through a matrix of circuit elements called the AND-plane. The AND-plane generates 
specific logic combinations of the inputs and their complements. The outputs of the AND- 
plan leave at right angles to its inputs and run horizontally through another matrix called 
the OR-plane. The outputs of the OR-plane then run vertically and are stored in the output 
register during qp 2 . 

The circuit diagram of a specific programmable logic array is shown in figure 13b. This 
diagram will help to clarify the structure and function of the AND and OR-planes of the 
PLA. The input register bit for each input path is formed by a pass transistor clocked on 
<Pl leading to both inverting and non-inverting super buffers. These buffers drive two lines 
running vertically through the AND-plane, one for the input term and one for its 
complement. The outputs of the AND-plane are formed by horizontal lines with pull-up 
transistors at their leftmost end. The function of the PLA's AND-plane is then determined 
by the locations and gate connections of pull-down transistors connecting the horizontal 
lines to ground. 

Each output running horizontally from the AND-plane carries the NOR combination of all 
input signals which lead to the gates of transistors attached to it. For example, the 
horizontal row labelled R 3 has three transistors attached to it in the AND-plane, one 
controlled by A, one by B and one by C. If any of these inputs is high, then R 3 will be 
pulled down towards ground and will be low. 

Thus, R 3 = (A + B + C')' = A'B'C. Similarly, R 4 = (A + B' + C)' = A’BC. 

Ihe OR-plane matrix of circuit elements is identical in format to the AND-plane matrix, 
but rotated 90 degrees. Once again, each of its outputs is the NOR of the signals leading to 
the gales of all transistors attached to it. In figure 13b for example, both R 3 and R 4 lead to 
the gales of transistors leading from the output line Z 4 ' to ground. If either R 3 or R 4 is 




Fig. 13a. Overall Structure of the PLA 
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high , Z 4 ' will be low. Thus, Z 4 ' = NOR(R 3 ,R 4 ) = (A'B’C + A'BC')‘ . Up to this point the 
PL A implements the NOR-NOR canonical form of boolean function of its inputs. 

The output lines of the OR-plane matrix are run into an output register formed by pass 
transistors (clocked on yj) leading into inverting drivers. Note that the output Z 4 at this 
point is: Z 4 = A'B'C + A'BC. This expression illustrates why the two PLA planes, each 
implementing the NOR function, are usually referred to as the AND and OR-planes. 
Following the output register, the outputs appear directly as the sum of products canonical 
form of boolean functions of the PLA inputs, that is, the OR of AND terms. Each 
horizontal line of the PLA carries one product term. The number of horizontal lines is 
minimized when the product terms they carry are the essential prime implicants of the 
output functions. 

Figure 13c shows one possible layout topology for implementing the PLA in nMOS 
circuitry. The example is the same circuit illustrated in figure 13b. The input lines crossing 
each plane are run in poly. The output lines from each plane are run in metal. Paths 
running to ground are placed between alternate poly lines, on the diffusion level. It is then 
a simple matter to form the pulldown transistors connecting the metal output lines to 
ground. They are selectively located diffusion lines under the appropriate input poly lines. 

Although the PLA may implement a very irregular combinational function, the irregularity 
is confined to the irregular locations of pulldown transistors which "program” the function. 
The overall structure and topology of the PLA are very regular. Note that its overall shape 
and size is a function of the parameters: (i) the number of inputs, (ii) the number of 
product terms, (iii) the number of outputs, and (iv) the length unit A. 
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Fig.13c. Stick Diagram of PLA Example 
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Finite State Machines 

In many cases in the processing of data, it is necessary to know the outcome of the current 
processing step before proceeding with the next. Results of the current step may be used as 
inputs in the next step. The configuration shown in figure 14a can be used to implement a 
processing stage having this requirement, A typical register to register transfer stage has 
been modified by simply feeding back some of its outputs to some of its inputs. This 
structure implements a form of sequential machine known as a finite state machine. 

The feedback signals form a binary number which may be regarded as identifying the state 
of the machine. The value of this number is stored, along with the external inputs, in the 
first register during These combined inputs then propogate through the combinational 
logic. The resulting outputs are stored in the second register during The falling edge of 
<p-) must occur a sufficient time later to insure that all signals have propagated through! the 
combinational logic. Each qpf - <p 2 cycle results in two new sets of outputs: (i) the external 
outputs which are typically used for controlling other units of the system, and (ii) a new 
feedback number, which defines the next state of the machine. This process repeats during 
each clock period. The number of possible states is determined by the number of bits in the 
feedback path, and is finite. 

There are a number of ways of abstractly representing the states, the required state 
transitions, and the outputs of sequential machines under given input sequences. Possible 
representations include state diagrams, transition tables, boolean difference equations, etc. 
A large body of theory has been developed concerning the synthesis and analysis of 
sequential machines. The serious reader will benefit from a further study of the results of 
switching theory on this subject (R3, R4). 

Implementations of simple finite state machines are used to produce the very lowest level of 
system control sequencing, since they can autonomously generate control sequences. The 
sequential machine having a finite number of states is a very important element in the 
hierarchy of fundamental concepts used in integrated system architecture. 

The configuration shown in figure 14a implements a synchronous machine, since the 
feedback loop is only activated at times determined by the clock signals. In any clock 
period k, the output terms Zj and the next state terms Y f are valid during qp^k). They are 
functions of the external inputs Xj and feedback terms Yf which were valid during <pj(k-l). 
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If a sequential machine contains a feedback loop which is continuously active, then it may 
begin a response to a change in inputs or state at any time, rather than just at fixed clock 
times. Such a sequential machine is referred to as an asynchronous sequential machine. 
The analysis of asynchronous sequential machines and their implementation is far more 
complex than that of synchronous ones. Great care must be exercised to avoid any 
difference in state sequencing and outputs under arbitrary differential delays of signals 
through the circuit paths of such machines (R3. Ch.5). There will be only a few special 
cases where we use the asynchronous form of sequential machine (Chapter 7), and these will 
be subject to detailed analysis. 

Where sequential machines are required within integrated systems, we will generally 
implement them in synchronous form. Synchronous machines are rather easy to implement 
correctly, and fit naturally into the two phase clocking scheme used for moving data around 
within our systems. 

However, the reader should carefully note that an implementation of a synchronous 
sequential machine functions correctly only if the delays in the circuit paths are sufficiently 
short compared to the clock period. If we were to implement many copies of a particular 
machine, the probability of correct function for any given copy is thus a function of (i) the 
clock period used, and (ii) the distribution of differential delays in that copy’s signal paths. 
Our estimate that a particular copy will function correctly is thus based in part on 
assumptions about the ratio of likely deviations in circuit delays to the clock period. A 
discussion of delays in MOS circuits is given in chapter 1. 

There is a very straightforward way to implement simple finite state machines in integrated 
systems: we use the PLA form of combinational logic and just feed back some of the 
outputs to the inputs, as illustrated in figure 14b. The circuit's structure is topologically 
regular, has a reasonable topological interface as a subsystem, and is of a shape and size that 
are functions of the appropriate parameters. The function of this circuit is determined by 
the "programming” of its PLA logic. If, for example, early in a design cycle there is some 
uncertainty in the details of the desired sequencing of such a circuit, it is easy to provide 
layout space for extra, unused inputs, mimerms, or outputs as contingencies. 

The following simple example will help illustrate the basic concepts of finite state machines 
and their implementation in nMOS circuitry. A busy highway is intersected by a little used 
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farmroad, as shown in figure 15a. Delectors are installed which cause the signal C to go 
high in the presence of a car or cars on the farmroad at the positions labelled C. We wish 
to control traffic lights at the intersection, so that in the absence of any cars waiting to 
cross or turn left on the highway from the farmroad, the highway lights will remain green. 
If any cars are detected at either position C, we wish the highway lights to cycle through 
caution to red, and the farmroad lights then to turn green. The farmroad light is to remain 
green only while the detectors signal the presence of a car or cars, but never longer than 
some fraction of a minute. The farmroad light is then to cycle through caution to red, and 
the highway light then to turn green. The highway light is not to be interruptible again by 
the farmroad traffic until some fraction of a minute has passed. 

A state diagram model of a finite state machine to control the lights is sketched in figure 
15b. This diagram identifies four possible states of the machine, and indicates the input 
conditions which cause all possible state transitions. A block diagram of the PLA circuit 
implementing the machine is shown in figure 15c. The circuit uses the signal C as an input, 
and provides outputs HL and FL which encode the colors of the highway and farmroad 
lights it controls. Note that a timer is used to provide, as controller inputs, the short and 
long timeout signals (TS, and TL), at appropriate times following a start timer (ST) signal 
output from the controller. This timer could be implemented as a digital counter in the 
same nMOS circuitry. Another abstract model describing the desired function of the 
controller is given in the state transition table in figure 15d, which contains similar 
information to that in the state diagram. 

The detailed sequencing of the machine under various input sequences is described by both 
the state diagram and transition table models of the controller. Consider starting in the 
state HG, where the highway lights are green. The machine remains in state HG as long as 
either no cars are detected or the long timeout has not occurrred, in other words as long as 
(C)and(TL) = 0. After the long timeout occurs, if any cars are detected, the machine 
restarts the timer and changes state to HY, where the highway lights are yellow. It remains 
in state HY only until the short timeout occurs, and then restarts the timer and changes to 
state FG, w'here the farmroad lights are green. It remains in state FG until either no cars 
are detected or the long timeout occurs, i.e. (C)’OR(TL) = 1. Then it restarts the timer and 
changes to slate FY, where the farmroad light is yellow. It remains in state HY only until 
the short timeout occurs. It then restarts the timer and changes to state HG, the starting 


state. 
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The locations of transistors in the PL A light controller circuit can be determined by "hand 
assembling” the ’’program" specified in the "symbolic" transition table in figure 15d, 
resulting in the encoded stale transition of figure 15e. First we assign codes to the states. 
In the example: state HG is encoded as (Yq.Y^) = (0,0); HY as (0,1); FG as (1,1); and FY 
as (1,0). Next, we assign codes to the output light control signals: green is encoded as (0,0), 
yellow as (0,1), and red as (1,0). We now form the encoded state transition table by 
constructing one row for each product term implied by the original symbolic table of figure 
15d. A row in table 15d specifying a state transition as a function of a single input variable 
or single product term of input variables produces a single row in table 15e. A row in table 
15d specifying a state transition as a function of a sum or sum of products of input 
variables, leads to a corresponding number of rows in table 15e. 

Placement of the transistors within the PLA matrices follows directly from the encoded 
state transition table: 

(i) For each logic-1 in the next state and output columns in the table, we run a diffusion 
path from the corresponding next state or output line in the PLA OR-plane, under the 
corresponding product term line, to ground. This creates a transistor controlled by the 
product term line. Then, if that controlling product term line is ever high, the path to the 
output inverter will be low, and the output will be high. The output line will be low unless 
some product term line controlling it is high. 

(ii) For each logic-1 in the input and present state columns in the table, we run a diffusion 
path from the corresponding product term line, under the corresponding inverted input or 
next state line in the PLA AND-plane, to ground. The transistor thus created is controlled 
by the inverted input or next state line. Unless that controlling line is high, the product 
term line will be low. 

(iii) For each logic-0 in the input and present state columns in the table, we run a diffusion 
path from the corresponding product term line, under the corresponding non-inverted input 
or next state line in the PLA AND-plane, to ground. The transistor thus created is 
controlled by the non-inverted input or next state line. Unless that controlling line is high, 
the product term line will be low. 

Note that if all lines which control the transistors connecting a given product term line to 
ground are low. then that product term line will be high. Otherwise it will be low. 
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Fig. 15e. Encoded State Transition Table for the Light Controller 
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The PI.A circuit in figure 15f is programmed from the transition table in figure 15e, 
according to the rules above, and implements the traffic light controller. Note that this LSI 
implementation does not exactly strain itself to meet the time response requirements of the 
control problem: it can run at a clock rate at least 10^ times as fast as required. Also, note 
that the PLA controller is roughly (150A) Z in area. Using the 1978 value of A = 3/jtni, this 
controller is (450juni)^ ~ 0.002 enr in area. A PLA controller this size may contain over 
150 transistors, but occupies only l/125 th of the area of a typical 0.25 ent^ silicon chip in 
1977. By the late-80's, as A scales down towards its ultimate limits, such a controller will 
require only ~ 1/25,000 th of the area of such a chip. 

As we will see in later chapters, a data processing machine of any desired complexity can be 
created by interconnecting register to register data processing paths constructed along the 
lines of that shown in figure 11c, such paths being controlled by finite state machines 
implemented as shown in figure 14b. The data paths form the "highways" for the 
movement of data, under control of the finite state machine "traffic controllers". 
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Towards a Structured Design Methodology 

The task of designing very complex systems involves managing, in some highly structured 
way, the space and time relationships between the various levels of system building blocks so 
that the entire system will function as intended when it is finished. The beginnings of a 
structured design methodology for VLSI systems can be produced by merging together in a 
hierarchy the concepts presented in this chapter. Designs are then done in a "top down” 
manner, but with a full understanding by the architect of the successive lower levels of the 
hierarchy. 

To begin, we plan our digital processing systems as combinations of register to register data 
transfer paths, controlled by finite state machines. Then the geometric shapes, relative sizes, 
and interconnection topologies of all subsystem modules are collectively planned so all 
modules will merge together snugly, with a minimum of space and time wasted by random 
interconnect wiring. Storage registers are typically constructed by using charge stored on 
input gates of inverting logic. The combinational logic in the data paths is typically 
implemented using steering logic composed of regular structures of pass transistors. Most of 
the combinational logic in the finite state machines is typically implemented using PLA’s. 
All functioning is sequenced using a two-phase, non-overlapping clock scheme. 

When viewed in its entirety, a system designed in this manner is seen as a hierarchy of 
building blocks, from the very lowest level device and circuit constructs, on up to and 
including the high level system software and application programs in which the intended 
functions of the system are finally expressed. Individuals who understand the key concepts 
of each level in this hierarchy will recognize that the boundaries between levels are rather 
elastic ones. Each level of activity might best be optimized not on its own as a specialty, 
but as it fits into an overall systems picture. For example, the activity "logic design" in 
integrated systems might best be conceptualized as the search for techniques and inventions 
which best couple the physical, topological, and geometric properties of integrated devices 
and circuits with the desired properties of digital VLSI systems. The search for alternative 
components for any given design hierarchy, and the search for alternative hierarchies, will 
be done best by those who span more than one specialty. 

A particularly uniform view of such a system of nested modules emerges if we view every 
module at every level as a finite state machine or data path contioiled by a finite state 
machine. At the lowest level, elements such as the slack and register cells may be viewed as 
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state machines with one feedback term (the output), two external inputs (the control 
signals), and a one bit state register. These rudimentary state machines are grouped in a 
structured manner to form portions of a state machine, or data path controlled by a state 
machine, at the next level of the hierarchy. Structured arrays of identical state machines 
often provide a mechanism for distributing processing among memory cells 1 ' 0 , thus 
enabling vast increases in processing bandwidth. Although in some cases the feedback paths 
are used in rather specialized ways, the state machine metaphor still provides a precise 
description of module behavior. The entire system may thus be viewed as a giant hierarchy 
of nested machines, each level containing and controlling those below it. A detailed 
quantitative treatment of certain hierarchically organized machines is given in chapter 9. 

In chapters 5 and 6 we will apply the design methodology developed in this chapter to the 
design of a digital computer system. A one chip implementation of the data path portion of 
this computer system is illustrated in the frontispiece. Consistent use of the described 
design methodology resulted in a design of great regularity, short delay times, low power 
consumption, and high logical processing capability. As we will see in chapter 4, regular 
designs, with small numbers of basic circuit cell types replicated in two dimensions to form 
subsystems, also have significant implementation advantages over less structured designs. 
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This chapter presents the basic concepts involved in implementing integrated system designs, 
from the system designer's point of view. Tools are described which help the designer 
produce the geometrical layout patterns for each layer of an integrated system given the 
logic, circuit, or topological level design of the system. Procedures are described for 
encoding these layout patterns and then using the encoded layouts in the patterning and 
fabrication processes to implement the integrated system. In addition, we discuss how 
design tools and procedures are likely to evolve towards fully integrated design systems, 
under the influence of increased complexity of design and predictable changes in the 
technologies of implementation. 

To enable groups of readers to actually design moderate sized LSI systems, we've included 
descriptions of easily constructed LSI design tools and procedures for organizing and 
implementing LSI multi-project chips. In each case, the tools are described as part of a 
complete system of design and implementation procedures, some of which are performed 
manually while others are machine assisted. Those experienced in software system design 
will recognize that construction of the machine-assisted portions of these systems is fairly 
straightforward. Contrary to what many may think, designing your own LSI projects, 
merging them onto collaborative multi-project chips, and having these implemented by 
commercial maskmaking and wafer-fabrication firms is now well within the computational 
and financial reach of most industrial R&D groups and university EE/CS departments. 


We are firm believers in learning by cluing , and hope that the information provided in this 
chapter will help and encourage-many groups of readers to try their hand at building LSI 
design tools and designing LSI systems. Such first-hand experience will lead to a deeper 
understanding of the remaining material, in this text. 
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An overview of the stages of integrated system design, layout, and implementation is given 
in figure 1. The designer first transforms the circuit and topological level designs into a 
geometrical layout of the system, using procedures described later in this chapter. In order 
to optimize the layout, perform various design checks, and discover errors, the designer 
usually iterates several times between design and layout. The result is a set of design files 
describing the layout. These files are in a particular representation called an intermediate 
form, which efficiently and unambiguously describes the layout geometry. 

The design files are then converted into files for driving the chosen patterning mechanism. 
At present, design files are commonly converted into pattern generator (PG) files, for use 
by a maskmaking firm for driving an optical pattern generator, the first step of 
mask making. By a sequence of photolithographic steps, the mask house produces a set of 
masks, which a commercial wafer fabrication firm may then use to pattern silicon wafers. 
Each finished wafer contains an array of system chips. The wafers are then diced into 
separate chips, which are packaged and tested to yield working systems. 

From the system designer's point of view, maskmaking and fabrication can be visualized as 
one would a film processing service: the designer produces the "artwork" (design files), from 
which the mask house makes "negatives" (masks), w'hich are then run on a fab line to 
produce "prints" (wafers). The maskmaking and fabrication sequence is function, design, 
and layout independent: the mask and fab firms do not require detailed information about 
the integrated systems they fabricate. If the original layouts satify the design rules, and 
satisfy a few constraints imposed by patterning and fabrication, then these processes will 
yield correctly patterned wafers. 

One need not closely bind a system’s design to the detailed processing specifications of 
particular mask and fab firms. Various firms will differ somewhat in the minimum value 
of the length unit A which they can successfully process. The transit time of the transistors 
fabricated, and the resistance per square and capacitance per unit area of fabricated features 
will also vary from one fab line to another. Flowever, well structured and relatively process 
independent nMOS designs will function correctly if scaled to a value of A appropriate for 
the chosen fabrication facilities, and operated using an appropriate system clock period. 


We next examine some of the present implementation procedures a bit more closely, to set 
the stage for sections on design and layout. Those later sections will be clearer if one can 
visualize how the design files are to be used during patterning and fabrication. 



I 
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Fig. 
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Patterning and Fabrication 

On completion of design and layout, the system design is contained in system layout files in 
intermediate form. Prior to fabrication, a final check plot of the layout is usually generated 
by converting these design files into files for driving a graphics plotter. Check plots are 
used for visually checking for design rule violations and other design errors. Once the 
designers have done as much visual checking as they are going to do, the system layout files 
are converted into pattern generator (PG) files, to be sent to the maskmaking facility. 
Figure 2 summarizes the sequence of patterning and fabrication procedures which then 
follows, and identifies the artifacts passed on at each step in the sequence. 

Maskmaking begins with pattern generation to produce reticles. Present pattern generators 
are projector-like systems containing (i) a precisely movable stage, (ii) an aperture of 
precisely variable rectangular size and angular orientation, and (iii) a light source, all 
program controllable by a computer system. To produce a reticle, a photographic plate is 
mounted on the stage, and the PG file for a particular system layer is used to direct the 
flashing of a sequence of rectangular exposures, of particular sizes and orientations, onto a 
sequence of coordinate locations on the plate, as illustrated in figure 3. 

The PG file contains a sequence of entries, each of which describes a rectangle. A typical 
representation uses five numbers for each rectangle: the x,y coordinates of its center, and its 
height, width, and angular orientation, as shown in figure 4. One can now visualize the 
nature of the conversion from intermediate form to PG files: the layout of each layer must 
be decomposed into its equivalent as a set of rectangles, each having [x,y,h,w,a] values 
flashable by the particular pattern generator, and these rectangles must be sorted into an 
efficient flashing sequence for that pattern generator. 

When the flashing sequence is completed, the plate is developed, yielding the reticle. A 
sketch of such a reticle is given in figure 5. Each reticle is a photographic master copy 
much like a photo negative, of the layout of one system layer, usually at a scale ten times 
(10x) the final system chip size. Photo enlargements of reticles, called "blowbacks", may be 
obtained from the mask house, to provide a further level of checking of design layout. PG 
file conversion, and pattern generation. At the current value of X = 3 microns, blowbacks at 
approximately 100 to I 50 limes actual chip dimensions have sufficient detail to enable visual 
checking of the smallest features. Blowbacks of reticles may also be obtained in the form of 
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color transparancies, to enable inspection of superposed overlays of various layers. 

Once the lOx reticles have been generated, a lx master mask is made from each reticle using 
a photorepeater, often called a step and repeat camera. The photorepeater exposes a 
photographic plate held on a moveable stage, as in the pattern generator. In this case, 
however, each plate exposure is a 10:1 photo reduction of the reticle pattern. Between 
exposures the stage is moved by a precise x,y stepping distance. This process is repeated 
until a complete array of lx chip patterns for one layer of the system has been exposed. 
The plate is then developed to produce a lx master mask. Figure 6 sketches such a mask 
made from the reticle in figure 5. 

Note that when each reticle is inserted in the photorepeater, the position and angular 
orientation of the reticle pattern is carefully adjusted by microscopic examination of two 
fiducial marks on the reticle. These marks are placed as part of the pattern generation 
process, and have the same precise position relative to the chip pattern origin on each of the 
system’s reticles, thus assuring that all mask levels produced with the photorepeater will 
accurately register with each other. 

A succession of contact prints is made from each master mask to yield a number of working 
masks . sometimes called working plates , for each system layer. These are the actual masks 
used in wafer fabrication. During the contact printing step of the typical wafer fabrication 
procedure, the working plates occasionally become worn or damaged, so several are usually 
made for each layer. 

The wafer fabrication facility uses the working plates in the sequence of patterning and 
process steps described in chapter 2, to produce finished wafers. The fab line requires no 
detailed information about the design or mask patterns of the integrated system being 
fabricated. However, several auxiliary patterns are normally included in the mask patterns, 
some of which are replicated on each chip and are examined during wafer fabrication: (i) 
alignment marks , which are used to accurately overlay successive masks with previous 
patterning steps, (ii) line width testers, sometimes called critical dimensions (C/D’s), which 
are lines in each mask layer of stated width that may be examined during mask making and 
fabrication to control dimensional tolerances, and (iii) a few simple test transistors and 
their associated probe pads, w'hich may be electrically tested prior to packaging to verify 
that the wafer fabrication process was successful. 
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The finished wafers are divided, into chips and packaged by the sequence of steps sketched 
in figure 7. The wafers are diced into individual chips by first scribing their surface along 
the boundary lines between chips, called scribe lines, with either a diamond tipped scribe or 
a diamond edged saw blade, and then fracturing them along these lines. Each individual 
chip is then cemented into the cavity of a package. After fine wires are bonded between the 
contact pads on the chip and the leads of the package, and a cover cemented over the cavity, 
the system is ready for testing. 

From the preceding we see that once a system's design files have been produced, all the 
remaining implementation procedures are design and layout independent, and largely 
automatic. However, the many extraneous parameters, patterns, and constraints involved in 
mask making and fabrication must be carefully thought through and defined in order to 
guarantee successful implementation within a reasonable turnaround time. The PG files 
must be correctly sorted and formatted for the pattern generator to be used. The lOx 
pattern of the chip must fit within the largest reticle that the pattern generator can produce. 
The photorepeater used will determine the shape, size, and location of the fiducial marks on 
the reticle. The size, surface material, and photographic polarity, either positive ( clear 
field-opaque features) or negative (dark field-transparent features), of the working 
plates will be a function of the fabrication facility to be used. Each fab line also typically 
prescribes its own patterns for the alignment marks and test transistors to be included along 
with the system in the mask patterns. 


While many designs may be scalable and have some longevity, the parameters, patterns, and 
constraints of maskmaking and fabrication are changing rapidly as the technologies evolve. 
This constant change complicates interactions with mask and fab firms. I.ater we describe 
procedures for implementating moderate sized LSI systems as part of multi-project chips. 
Such chips are collaborative efforts of many designers, enabling many projects to be merged 
into one maskmaking and wafer fabrication run. In this way the procedural overhead 
involved may be shared. 
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Hand Layout and Digitization using a Symbolic Layout Language 

A simple and common method of producing system layouts is to draw them by hand. This 
is typically done on a one lambda grid using the familiar color codes to identify various 
system layers. Once the layout has been hand drawn it can then be digitized, or translated 
into machine readable form, bv encoding it into a symbolic layout language. Hand layout 
and digitization using a symbolic layout language is quite a practical method of generating 
design files for highly structured system designs. Be warned, how'ever, that implementing 
irregular structures using these primitive procedures is a difficult and tedious task. 

If a system has only a few cell types which are replicated over and over, and otherwise has 
little "random wiring", one need draw only a single copy of each cell type, and then make 
reproductions or equivalent sized outlines of these cell drawings. All these cell 
reproductions may then be patched together to plan and build up the overall layout. 
Similarly, only one symbolic digitization need be made for each cell type. The replication 
of cells in various orientations and locations in the system layout can then be easily 
described using the symbolic layout language. In a sense, the ease with which a system’s 
layout can be described using a primitive layout languge provides a measure of the regularity 
of its design. The OM2 Data Chip pictured in the frontispiece was laid out and digitized in 
this way, using only the simplest machine aids. 

The function of a symbolic layout language, in its simplest form, is similar to that of a 
macro-assembler. The user defines symbols (macros) w'hich describe the layout of basic 
system cells. The locations and orientations of instances of these symbols are described in 
the language, as a function of appropriate parameters. These symbolic descriptions may 
then be mechanically processed in a manner similar to the expansion of a macro assembly 
language program, to yield the intermediate form description of the system layout, which is 
analogous to machine code for generating output files. An example intermediate form is 
described in a later section. The intermediate form files may be processed to yield the PG 
files: each layer being a machine encoded collection of rectangles encoded as [x,y,h,w,a] 
values. The generation of PG files is analogous to the loading and execution of machine 
code to produce output files: it is a process of "unrolling" and fully instantiating all symbol 
descriptions into a sequence and format suitable for a particular output device. Definition 
of simple layout languages and the construction of their assemblers is fairly straightforward. 
The reader mas define and implement layout languages by using the macro assembler or 
higher level language facilities of any commonly available computer system (Rl, R3). 
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The following example will clarify the concepts and procedures of hand layout and symbolic 
layout description: We wish to create an array of shift registers consisting of parallel 
horizontal rows of inverters coupled by clocked pass transistors, as in figure 5a., chapter 2, 
Figure 8a sketches the stick diagram of one row of the array. The entire array can be 
constructed from one basic cell containing an inverter, the pass transistor following it, VDD 
and GND buses crossing through on metal, and a clock line passing through on poly. Figure 
8b shows a hand sketch of the layout of the basic shift register cell, SRCELL, on a IX grid, 
subject to the design rules given in Ch.2, Sect.2. Since the inverters are coupled by pass 
transistors, the inverter pullup/pulldown ratio is ~ 8:1 (see Ch.l., Sect.2.). Also, while the 
4X wide metal lines could be IX narrower in between the contact regions, this would not 
decrease the cell size. As an exercise, the reader might check for design rule violations, and 
also for ways of further shrinking the ceil size. 

The SRCELL layout shown in figure 8b is composed using only rectangles placed at 
orientations which are integer multiples of 90°. The illustrations and descriptions in this 
section are considerably simplified by the use of such constrained layout constructions, and 
yet still illustrate the general principles involved. Were completely arbitrary shapes used, 
the SRCELL could be made somewhat smaller and still satisfy the design rules, 
interestingly, experience has shown that the simple extension of including rectangles at 
orientations which are integer multiples of 45° enables most cell layouts to reach within a 
few percent of the minimum area achievable using arbitrary shapes. There is a clear 
tradeoff here: the inclusion of increasingly complex geometrical objects in a layout will tend 
to reduce the minimum achievable layout area, but will also increase the computational 
complexity of the associated machine aids. 

We can informally characterize a simple layout language by examining figure 9, which 
contains a description of the layout of an array of SRCELLs using such a language. The 
language describes layouts as collections of BOXes on various layers. BOX statements 
describe each of these boxes by specifying their layer, the X.Y coordinates of their lower 
left corner and then their length, LX, in the x direction, and LY, in the y direction. The use 
of a box corner to encode its location simplifies the encoding task. BOX statements may 
describe arrays of identical boxes, with the array's lower left corner origin at X,Y, by 
including optional parameters which specify the number NX and replication interval IX in 
the \ direction, and NY and IY in the y direction. Dimensions are given in the length unit, 
X. A SCALE statement defines the value of X for this particular layout as X = 3.0 microns. 
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In figure 9, the SRCELL is first described as a macro, or SYMBOL. The reader can verify 
that the collection of BOXes in the definition of the SYMBOL SRCELL, when ORed 
together, produces the layout in figure 8b. This SRCELL is then replicated a number of 
times in various layout locations according to parameters in several DRAW statements. 

Each DRAW statement describes the placement of an array of cells as follows: The cell 
described by the named SYMBOL definition is considered to be drawn at the origin. It is 
then mirrored (about the x and/or y axis), and/or rotated (by 0°, 90°, 180°, or 270°) about 
the origin, as specified by MIRROR or ANGLE transformations. The cell thus positioned 
may then be replicated NX times at distance intervals IX in the x direction, and that row of 
cells then be replicated NY times at intervals IY in the y direction. The resulting array of 
cells is then translated a distance X,Y from the origin, and placed into the layout. 


SCALE 

LAMBDA = 3.0M!CRON; 


SYMBOL 

START, SRCELL; 


BOX 

Di FF,X = 3,Y=0,LX=4,LY =4,NY = 2,IY = 19; 


BOX 

DI FF,X = 2,Y=4,LX=6,LY=8; 


BOX 

DI FF,X = 8,Y=9,LX = 8,LY=2; 

INVERTER OUTPUT 

BOX 

DI FF,X = 16,Y = 9,LX=4,LY=4; 


BOX 

Di FF,X=4,Y = 12,LX = 2,LY = 7; 


BOX 

IMPL,X=2.5,Y=9.5,LX=5,LY = 10; 

PULLUP IMPLANT 

BOX 

POLY,X=0,Y = 5,LX= 10,LY=2; 

CELL INPUT 

BOX 

POLY,X = 12,Y=0,LX = 2,LY = 26; 

CLOCK LINE 

BOX 

POLY,X = 16,Y = 5,LX = 5,LY=^; 

CELL OUTPUT 

BOX 

POLY,X = 16,Y = 7,LX=4,LY=3; 


BOX 

POLY,X = 2,Y = l 1,LX = 6,LY=7; 


BOX 

CUTS,X=4,Y = 1 ,LX = 2,LY=2,NY = 2,1 Y=19; 


BOX 

CUTS,X = 17,Y = 8,LX = 2,LY'=4; 


BOX 

CUTS,X=4,Y=9,LX=2,LY =4; 


BOX 

METL,X=0, Y=0,LX = 21 ,LY=4,N Y=2,l Y = 19; 

VDD & GND 

BOX 

METL,X = 3,Y=8,LX=4,LY=6; 


BOX 

METL,X = 16,Y = 7,LX=4,LY = 6; 


SYMBOL 

END; 


DRAW 

SRCELL,NX=4,N Y = 2,1X = 21,1 Y = 38,X=0,Y=0; 


DRAW 

SRCELL,MIRRORX,NX=4,!X = 21,X=0,Y=42; 



END; 


Figure 9. Symbolic Description of Shift Register Array 


The "program" in figure 9 describes an array of 3 rows and 4 columns of SRCELLs. After 
machine assembly of this program, the resulting design file can be used to generate check 
plots, which may be inspected to detect errors made in encoding the layout. A check plot of 
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Figure 11. Check Plot of 3 by 4 Array of SRCELLS 

[Dimensions in lambda, with the cell outlines indicating 
relative cell placements according to the program in fig.9] 


(srarrav.press) 

































































































































































[Ch.4: Implementing Integrated System Designs: Sect.l]<Conway>inipl,vlsi January 25, 1978 5:55 PM 


9 


one SRCELL is given in figure 10a, and we see that the cell has been correctly digitized. A 
set of stipple patterns is used in this check plot to encode the different system layers, with 
the coding specified in figure 10b. If available, color checkplots are much belter: color 
check plots can be made denser and still be readable, and association of colors with layers 
and functions is more easily made and subject to fewer errors in practice. Note: the implant 
layer hasn't been plotted in fig. 10a, so that the other layers may be more easily seen, 

A check plot of the complete 3 by 4 array of cells is given in figure 11 (again the implant 
layer is not plotted). Although figure 11 is of insufficient scale to check details within the 
cells, it enables us to check for correct relative placement of the SRCELLs. The individual 
cel! outlines are included in figure 11, to indicate the nature of the placement of the central 
row of the array. By mirroring the central row prior to its placement, that row is able to 
share VDD and GND with the other two rows, thus reducing the overall array size. There is 
one column of cells per 21 lambda in the x-direction, and one row of cells per 19 lambda in 
the y-direction. It is very important to note that the outcome of each DRAW statement is 
determined by the order in which any mirror, rotate, replicate, and translate operations 
occur (see the section on the Caltech Intermediate Form, and also R2, Ch.6). Any 
permutation in the order of these operations may lead to a completely different result. 

In chapter 3 we found that the PLA is a useful subsystem structure, often used to implement 
finite state machines and combinatorial logic. We now present a worked out example of a 
PLA's layout, to further clarify symbolic layout description. Chapter 3 contains several 
stick diagrams of PLAs (figs. 13c, 15f). An examination of these stick diagrams reveals that 
the PLA can be constructed using 6 basic cell types and a slight amount of "random wiring". 
Once these 6 basic cells have been laved out by hand and symbolically digitized, it is easy to 
construct symbolic descriptions of different sized PLAs having various numbers of inputs, 
product-terms, and outputs. 

The digitized layouts of four of these basic cells are check plotted in figure 12. The AND 
and OR planes of the PLA are constructed as arrays of the 14A by 14A PLAcellpair cell 
plotted in figure 12a, which contains two poly and two metal signal lines, and one ground 
line on the diffusion layer. Diffusion paths may be added in any of four locations in such 
cells to form transistors, and thus program the PLA. The connection between the AND and 
OR planes is made using the Pl.Aconnect cell plotted in figure 12b: these cells change the 
signal paths from the metal to the poly layer. The pul!up transistors to be placed at the 
edges of the AND and OR planes are implemented b\ the PullupPair cell in figure 12c. I he 



[Ch.4.. Sectl] pag 




Fig. 12a. PLAcellpair 
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ground return paths, to be connected to the diffusion lines crossing the planes, are 
implemented by the PI.Aground cell in figure 12d. The PLAground cell is structured so that 
rows of the cell may be inserted at intervals within AND planes, and columns of Lhe cell 
inserted at intervals within OK planes, to provide proper ground returns in large PLA’s. 
The two other cell types required are the input drivers and output inverters: these cell 
layouts are left as exercises for the reader. The cells in figure 12 have been collectively 
planned so as to fit on a 14A pitch surrounding the PLA’s planes. Figure 13 contains a 
symbolic description of each of these cell types, and a description of a moderate sized PLA 
constructed from these cells: 


Figure 13. Symbolic Description of a 5-lnput, 10-Pterm, 8-Output PLA 


SCALE 

LAM BDA = 3.0M1CRON; 


PLA CELL DEFINITIONS: 

SYMBOL 

START.PL AC ELLPAIR; 

BOX 

D1FF,X=0,Y = 1 ,LX=4.LY=4,NY = 2,I Y = 7; 

BOX 

Dl FF.X = 8,Y=0.LX = 2,LY = 14; 

BOX 

POLY,Xs5,Y=0,LX = 2,LY = 14,NX = 2,IX = 6; 

BOX 

CUTS.X = 1,Y = 2.LX = 2,LY = 2,NY = 2,1 Y = 7; 

BOX 

METL,X=0,Y=1,LX = ] 4.LY=4,N Y = 2.!Y=7 

SYMBOL 

END; 

SYMBOL 

START.PLACONNECT; 

BOX 

Dl FF.X=(),Y= 1 ,LX=4,LY=4.N Y = 2,I Y = 7; 

BOX 

DIFF.X = 9,Y = 4,LX=4,LY = 4; 

BOX 

Dl FF,X = 13,Y = 4,LX = 3,LY = 2; 

BOX 

POLY,X = 6,Y = 1 ,LX = 10,LY = 2,NY = 2,IY = 8; 

BOX 

POLY,X = 3,Y = 1 ,LX = 3,LY=4,N Y = 2,I Y = 7; 

BOX 

POLY,X= 14,Y = 7,LX = 2,LY = 2; 

BOX 

CUTS,X = 1 ,Y = 2.LX=4,LY = 2,N Y = 2,IY = 7; 

BOX 

CU t'S.X = 10,Y = 5.LX = 2,LY=2: 

BOX 

M ETI..X =9,Y =0,LX=4.LY = 14; 

BOX 

M ETL.X =0,Y= 1,LX=6,LY=4,NY = 2,IY = 7; 

SYMBOL 

END; 

SYMBOL 

START.PULLUPPAIR; 

BOX 

IMPL,X=8.5,Y=0.5.LX=11,LY = 5; 

BOX 

I MPL,X=0.5,Y=4.5.LX = 5,LY=8; 

BOX 

1MPL,X=0.5,Y = 7.5 1 LX=9,LY = 5; 

BOX 

DIFF,X = 0,Y= 1,LX=4,LY=4; 

BOX 

DIFF,X = 4,Y = 2,LX = 16,LY=2; 

BOX 

DIFF.X = 2,Y=5,LX=2,LY=4; 

BOX 

DIFF,X = 2,Y=9,LX = 18,LY=2; 

BOX 

D1FF.X = 9,Y = 8,LX=4,LY=4; 

BOX 

POLY.X=10,Y=O.LX=8.LY=6; 

BOX 

POLY.X=18,Y = l,LX=2.LY=4; 

BOX 

POI Y,X = 8,Y=8,I X = 2.LY=4; 

BOX 

POI Y.X=0,Y = 7,LX = 8,LY = 6; 

BOX 

I’OLV ,X=0,Y=(i,LX=f).LY=l; 

BOX 

CUTS.X = 1 ,Y = 2.1 X = 2.l.Y = 2.NX = 2.IX = 17; 

BOX 

CUTS.X = S.Y = 9.1 X=4,LY=2; 

BOX 

MF I l..X=0.Y=(U.X=4.LY=14; 

BON 

ME I t ,X = 7,Y=8.1.X=fe,l Y=4; 

BOX 

MF I l.,\:17,Y:l,I.X=4 1 I.Y=4; 

SYMBOI 

END; 


[SEE FIGURE 12A,] 
DIFF TO GND 

METL TO PULLUPS 

[SEE FIGURE 12B ] 


GND 

[SEE FIGURE 12C.] 


VDD 
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SYMBOt 

START,PLAGROUND; 

BOX 

DIPF,X = 8,Y = l,LXs2,LY=9; 

BOX 

DIEF.X=6,Y = 3,LX = 4,LY=4; 

BOX 

POI Y,X = 3.Y=0,LXs2,LY=10; 

BOX 

POI .Y,X = 5.Y = 0,LX = 2.LY = 2NY =2,1 Y = 

BOX 

P()l.Y,X = 11,Y=1,LX=2,LY=9; 

BOX 

CUTS.X = 7.Y=4,LX = 2,LY = 2; 

BOX 

METL.X =0,Y=3,LX = 14,LY=4; 

SYMBOL 

END; 

SYMBOL 

START,PLAINPUT; 


[SEE FIGURE 12D.] 


GND 


[ insert symbol definition; size: 14 wide by ~35 high ] 

SYMBOL END; 

SYMBOL START,PLAOUTPUT; 

[ insert symbol definition; size: 14 wide by -41 high ] 

SYMBOL END; 


LAYOUT 5-INPUT, 10-PTERM,8-OUTPUT 
[SEE FIGURE 14] 


PLA: 


DRAW 

PLACELLPA! R,NX = 5,NY = 5,1 X= 14,1 Y = 14,X 

n 

o 

< 

ii 

p 

DRAW 

PLACONN ECT.N Y=5,1Y = 14,X = 70,Y=0; 


DRAW 

PU L.L.UPPA1 R,N Y = 5,l Y = I4,X=- 19,Y=0; 


DRAW 

PLAGROU ND,NX = 5,N Y = 2.!X = 14,1 Y = 79,X = 

0,Y=-10: 

DRAW 

PLACELLPAI R,ANGLE = 270,NX=4.N Y = 5,1X = 14,1 Y = 14,X = 86,Y = 14; 

DRAW 

PU 1 LUPPA1 R.ANGLE = 270,NX = 4,IX = 14,X = 

:86,Y = 89; 

DRAW 

PL AG ROUND, ANGLE=2 70,NY = 5,1 Y= 14.X 

= 141,Y = 14; 

DRAW 

PLAlNPLT,NX=5.IX=14.X=0.Y=-44; 


DRAW 

PLAOUTPUT,NX = 5,IX = 14,X = 86,Y = -41; 


BOX 

D1FE.X = 70,Y=-15,LX=4,LY=4; 


BOX 

CUTS.X = 7l,Y=- 14,LX = 2,LY = 2; 


BOX 

METL,X = 70,Y = - 15,LX=4,LY =4; 


BOX 

METL,X = - 19,Y = 70,LX=4,LY =9; 

VDD 

BOX 

METL,X=- 19,Y = 79LX = 105,LY=4; 

VDD 

BOX 

METL,X=82,Y=83,LX=4,LY=6; 

VDD 

BOX 

METL,X = 142,Y = 85LX = 9,LY=4; 

VDD 

BOX 

M ET L.X = 151. Y = -40,LX=4,LY = 129; 

VDD 

BOX 

METL,X = 142.Y = -40,LX=9,LY = 4; 

VDD 

BOX 

M ETL,X = -19,Y = -15,LX = 19.LY = 4; 

VDD 

BOX 

M ETL,X=-19,Y = -1! ,LX=4,LY =11; 

VDD 

BOX 

METL,X = 70,Y = 71,LX = 9,LY=4; 

GND 

BOX 

M ETL,X = 70,Y = -7,LX=9,LY=4; 

GND 

BOX 

M ETI..X = 70,Y = -24,LX=9.LY=4: 

GND 

BOX 

MFTL,X = 79,Y = -45,LXs4.LY*45; 

GND 

BOX 

M ETL,X = 83,Y = -21 ,LX = 3,LY=4; 

GND 

BOX 

M F.TL,X = 142,Y = -21 ,LX = 2,LY=4; 

GND 

BOX 

M ETL.X = 144,Y = -21,LX=4,LY = 21; 

GND 

BOX 

POI Y,X = -4,Y = -43,LX = 4,LY = 2; 

PHI 

BOX 

POLY,X = 142,Y = -7,LX = 15,LY=2; 

PH2 


[ insert the PLA "program", using BOXes on the diffusion 
layer to form transistors in the PLAcellpair cells ] 

[ insert the PLA's input, output, clock, and power connections ] 

END; 


A check plot of the PI.A described above is given in figure 14. This check plot has been 
simplified to include only the outlines of the basic cells, plus the additional wiring necessary 
to complete the PI.A. The dimensions and orientations of the cells may he found by 
comparing these outlines with the cell details in figure 12. Note that in figure 12 some of 
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142,89 



Fig, 14. Check Plot using only Cell Outlines, of the 5-Input, 10-Pterm, 8-Output PLA 

[dimensions in lambdas; symbol labels on origin cells of the fig. 13 DRAW statements] 


(p la. press) 
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the connect ion points, where paths leave or enter at cell edges or w'here internal connections 
may be later inserted, are tagged with tick marks. Cell placements and orientations in the 
check plot may be visualized by locating and identifying the appropriate connection point 
marks. A comparison of the check plot with the symbolic description above will clarify the 
function of the various DRAW statements. To assist in this comparison, the origin cell of 
the array of cells produced by each DRAW statement has been marked in figure 14 with its 
cell name. Note that this PLA layout could contain the PLA example of chapter 3, fig. 15f. 

Symbolic layout languages are easy to define, and may be primitive or sophisticated, 
according to the requirements of the user. The function of the assembler for such a 
language is simply to scan and decode the statements and translate them into design files in 
intermediate form. Conversion of design files into check plot or pattern generator output 
files is straightforward for the above simple language, since we have used only boxes with a 
severe constraint on angular orientations. MIRROR and ANGLE transformations are easily 
handled: x and y coordinates of symbols and boxes are simply replaced by ±x or ±y, 
according to the specific parameters, during the instantiation of symbols and drawing of 
boxes prior to their replication and translation into the layout output file. 

The effectiveness of the above language could be further increased by constructing an 
assembler capable of handling nested sysmbols. By using nested symbols, system layouts 
may be described in a hierarchical manner, leading to very compact descriptions of 
structured designs. At the lowest level, one might define symbols for such small but 
commonly encountered structures as the various forms of contacts. Boxes and these simple 
symbols could then be used to construct cells such as those in the PLA example above. The 
PLA could be constructed w'ith these cells, and then defined as a symbol to be used in a 
larger design. An example of the sort of function one might add to create a much more 
sophisticated language, and language processor, would be the capability of generating the 
layout description of a PLA from the collection of basic cells, as a function of its input, 
pterm, and output size parameters and logic function parameters. 

Figure 15 summarizes the procedures and artifacts of hand layout, and layout description 
and digitization using a layout langauge. By studying figure 15, and thinking back over the 
material and examples of this section, one can visualize a complete, though primitive, 
sequence of steps sufficient to prepare a design for implementation. These procedures are 
entirely adequate for preparing small LSI projects for implementation. The procedures may 
also be used for those large LSI systems which have highly structured designs. 
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Fig. 15. Design, Hand Layout, Design File Generation, And Design Checking 
Using a Symbolic Layout Language and Simple Machine Aids 
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The primary obstacle that these primitive procedures place in the path of the system 
designer is the sheer lime and effort it takes to get through the loop to a new check plot 
each time a small design change is made. The enthusiasm aroused by a sudden insight, such 
as the conception of a completely new topological possibility for an important system cell, 
can be dampened by the tedious tasks of hand layout and box digitization required before 
one can really see the full effect of the idea on the overall system layout. 

Though often supported by large batch mode CAD systems for containing, modifying, 
checkplotting, and simulating designs, the majority of LSI layout now done in industry 
begins with hand layout. Digitization is usually simplified by the use of digitizing tables, 
which are much like graphics plotters in reverse: a new section of a design, laid out by hand, 
is placed on the table and digitized by tapping switches while manually following the 
outlines of the cell's boxes with a pointer. Although this is less tedious than digitization 
using a layout language, it is still time-consuming and hardly interactive. 

The next section describes an interactive graphics layout system which enables the system 
designer to quickly sketch new layout ideas and see their effect immediately. 
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An Interactive Layout System 

[Seetiun contributed by Douglas Fairbairn, Xerox PARC, and James Rowson, Caltech] 

Computing hardware of sufficient power to support highly interactive graphics has in the 
past quite expensive, and this has inhibited the widespread application of interactive 
computing techniques. However, because of expected advances in VLSI technology, we are 
rapidly approaching the day when many will have access to personal computers with 
computing power rivaling today's medium to large-scale systems. It will be more difficult 
to provide effective software for these systems than it will be to build the computers 
themselves. In this section we describe a highly interactive layout system which runs on a 
modest personal computer, rather than on an expensive, limited access, centralized system. 
This system was developed anticipating the work environment of the future, in w'hich most 
"knowledge" workers will have personal computers as part of their normal office equipment. 

ICARUS^ (Integrated Circuit ARtwork Utility System) is a software system w'hich enables 
the user to create and modify an integrated system layout directly on a CRT display screen. 
ICARUS was conceived with the idea that the designer would create and edit a layout at the 
display, without doing any more than a rough sketch or "stick diagram" before beginning 
work. Creating and moving items is fast and easy enough so that the designer can truly 
sketch on the screen. Once the layout is basically correct, the items can be moved or 
modified to arrive at the most compact layout. 

The user is required to remember very little about the available commands or their use 
because the commands themselves are displayed on the screen and the system prompts the 
user for additional information as it is needed. The system can format and output check 
plots to matrix type printers or on raster-scan laser printers. ICARUS design files can be 
used to create standard pattern generation files from which masks can be made. An 
overview of design and layout procedures using the system is given in figure 16. It is 
instructive to compare this with figure 15, which presents equivalent steps for hand layout. 

All the software to accomplish these various steps runs on a small experimental 
minicomputer known as the Alto. This machine was designed by researchers at Xerox 
PARC as a general purpose personal computer suitable for both text and graphics 
applications^. No additional, special hardware is used by ICARUS. The ICARUS system is 
programmed in BCPI.. an ATGOl.-like high level language. I here are about 30K words of 
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Fig. 16. Design, Layout, Design File Generation, and Design Checking 
Using an Interactive Graphics Layout System 


(intactlayout sil) 
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compiled code in the system of which half is in memory at any given time. At minimum, 
the Alto memory has 64K 16 bit words. A 2.5 Mbyte cartridge disk drive is an integral part 
of the system. The user interacts with the system through an unencoded keyboard (software 
definable keys) and with a pointing device called a mouse (R2, pl73). A cursor is 
controlled on the screen by moving the mouse around on a small area of the user's desk. A 
bit map display with a resolution of 600x800 dots is used for output, and printers for doing 
check plots are available through an in-house computer network. 

The ICARUS display features two windows which provide a flexible working view of the 
layout, as shown in figure 17. The upper window is normally used for viewing a large piece 
of the layout at small magnification, and the lower window used for looking at a smaller 
section in more detail. The magnifications of the windows may be set independently. 

In addition to the two windows there are various menus and status lines presented in the 
display. The menu on the left is the command menu. The menu under the upper window is 
the parameter menu. Under the parameter menu is the stipple menu, containing the mask 
level codes. Rectangles at a given level are stippled with the pattern for that level. The 
patterns were chosen so that, where necessary, one pattern could be seen through the other to 
verify that appropriate layers are overlapping properly. Current drawing coordinates and 
the status of system memory space are displayed to the right of the stipple menu. 

The user interface is implemented principally through the display, the mouse and five 
conveniently located keys on the keyboard. Frequently used commands are given using only 
one or two simple hand operations, and can be done without glancing away from the 
display. These characteristics, coupled with rapid display redrawing, enhance the system's 
interactiveness. 

The internal data representation in ICARUS is based on three types of items: rectangles, 
symbols, and text strings. The organization of these items into memory data structures, and 
the typical run-time memory space allocation is illustrated in figure 18. 

Rectangles are created with the aid of the mouse. They may have angular orientations 
which are integer multiples of 45°. They can be moved, copied, or deleted using the mouse 
and one key. As items are created, they are added to an item list in main memory. Fach 
rectangle is stored as 6 words in memory: the first word is the pointer to the next item, the 
second specifies what layer it is on. what type of item it is, etc. I he third through sixth 
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Fig. 17. The ICARUS Display: 2 Views of a Layout in Progress (windows.prcss) 
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FILE ADDRESS 
NAME POINTER 
XMIN OF BOUNDING BOX 
XMAX OF BOUNDING BOX 
YMIN OF BOUNDING BOX 
YMAX OF BOUNDING BOX 


H OF CONNECTION POINTS 
CONNECTION POINTS 
CONNECTION POINTS 


ITEM DESCRIPTION 


ITEM DESCRIPTION 


NUMBER IN X DIRECTION 
NUMBER IN Y DIRECTION 
SPACING IN X DIRECTION 
SPACING IN Y DIRECTION 
X COORD OF ORIGIN 
Y COORD OF ORIGIN 


LINK TO DEFINITION 


ITEM DESCRIPTION 


string length 



I i i 

I l l 


MAIN MEMORY ALLOCATION DETAILED VIEW OF DATA 


Figure 18. ICARUS Memory Allocation and Data Structure 


(memory, ail) 
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words specify the minimum and.maximum x and y coordinates. The items are kept in order 
of increasing values of minimum x coordinate, so that the display may be quickly redrawn. 

When a symbol is defined by the user, the items which are contained within it are stored on 
the disk, while a pointer, the name and the bounding box for the symbol are placed in main 
memory. Symbols can be nested to any level. Once a symbol definition has been created, 
one is free to define symbol instances which are references to that definition. The symbol 
instance may be a command to draw one copy of the symbol at a certain location, or a 
whole array. The size of the symbol instance, which resides in main memory, is the same in 
both cases. The use of symbols wherever possible tends to preserve main memory space. 
Rather large systems can be designed using ICARUS, if the systems are well structured and 
make extensive use of symbols. This is true even when using a minimum sized 64K 
memory, which leaves little space for layout data. 

Text is used for identifying data and control lines and is merely a memory aid to the user. 
There is no attempt to make use of the text or other information in the drawing for 
connectivity or other types of checking. 

Operations more complex than those such as draw and move are implemented through the 
use of menus as shown in figure 17. The desired command is chosen by pointing at it with 
the cursor and clicking a mouse button. The selected command is then inverted to white on 
black video to identify its selection, which the user then confirms with a key on the 
keyboard. At this point, the system prompts the user with instructions presented in the 
display area normally holding the stipple menu. The instructions lead the user through the 
individual steps required, for example, to mirror or rotate a group of items. 

Operations on symbols are defined in a secondary menu which can be reached by selecting 
the command "symbols" on the primary menu. The secondary menu offers commands such 
as define symbol, draw' symbol, list the names of the symbols is the symbol library, or 
expand symbol. This last command is used to modify a symbol w'hich is already defined, 
the modified symbol definition immediately updating all symbol instances which point to it. 

Various system parameters are displayed in the parameter line directly below the top 
window. Values such as. the default line width for the currently selected layer, the 
magnification of the top and bottom window's, and the spacing of the tick marks are all 
displayed. The parameter values can be changed at any time by selecting the desired one 
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and typing the new parameter value on the keyboard. The X,Y layout coordinates of the 
point last clicked with the mouse are displayed at the right of the screen. The DX.DY 
distances between the last two clicks are also displayed. This feature provides a convenient 
"ruler" for measuring distances on the layout. 

The construction of an interactive layout system such as ICARUS is a relatively 
straightforward task for one who is experienced in interactive computer graphics (R2), given 
a display oriented minicomputer system and effective systems building software. A first 
version of ICARUS was constructed in 3 man-months, and a mature version produced in an 
additional 4 man-months. 

ICARUS has been used internally in Xerox to lay out many integrated system projects, and 
to organize a number of multi project chips. Among the users were a number of individuals 
previously unfamiliar with integrated circuit layout, who nevertheless successfully completed 
LSI projects with up to 10,000 transistors. We find that the interactive nature of such a 
system not only aids the experienced designer, but also enhances the learning process for the 
novice. We believe that such interactive, personal design systems greatly enhance the 
creative ability of the designer by enabling easy generation and examination of many more 
design alternatives per unit time than would be the case with centralized, non-interactive 
design systems. 

However, there is more to integrated system design than circuit layout. Design rules must be 
checked, logic transfer functions tested, and, in certain cases, circuit transfer functions 
computed to determine delays and predict system performance. We believe that the 
direction in which to search for further improvements in design tools is in the replacement 
of the primitive ICARUS type of data structure with one which allows design functions 
other than just layout to also interactively operate upon the same data base. This is the 
subject of the later section on fully integrated design systems. 
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The Caltech Intermediate Form for LSI Layout Description 

[Section Contributed by Robert F. Sproull, Carnegie Mellon University, and Richard F. Lyon, Xerox PARC] 


The Caltech Intermediate Form (CIF Version 2.0) is a means of describing graphic items 
(mask features) of interest to LSI circuit and system designers. Its purpose is to serve as a 
standard machine readable representation from which other forms can be constructed for 
specific output devices such as plotters, video displays, and pattern-generation machines. 
The intermediate form is not intended as a symbolic layout language: CIF files will usually 
be created by computer programs from other representations, such as a symbolic layout 
language or an interactive design program. Nevertheless, the form is a fairly readable text 
file, in order to simplify combining files and tracing difficulties. 

The basic idea of the form is to specify literally every geometric object in the design using 
ample precision. Use of this form provides participating design groups easy access to output 
devices other than their own, enables sharing designs with others, allows combining several 
designs to form a larger chip, and the like. It is not necessary for all participating groups to 
implement the entire set of features of CIF, as long as their programs and documents 
contain warnings about unimplemented functions; nevertheless, the syntax must be correctly 
interpreted by all programs that read CIF, to assure a reasonable result. 

CIF thus serves as the common denominator in the descriptions of various integrated system 
projects. No matter what the original input methods are (hand layout and coding, or a 
design system), the designs will be translated to CIF as an intermediate, before being 
translated again to a variety of formats for output devices or other design aids. 

The original CIF was conceived by Ivan Sutherland and Ron Ayers in 1976. Subsequent 
improvements were contributed by Carlo Sequin, Douglas Fairbairn, and Stephen 
Trimberger. 

This specification is divided into four parts: a description of the syntax of the form, a 
description of the semantics, an explanation of the transformations used, and a discussion 
of the conversion of wires to boxes. 
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Syntax 


A CIF file is composed of a sequence of characters in a limited character set. The file 
contains a list of commands, followed by an end marker; the commands are separated with 
semicolons. Commands are: 


Form 

P path 

B integer integer point point 

R integer point 
W integer path 

L shortname 

DS integer integer integer 
DF 

DD integer 

C integer transformation 

digit userText 
{ commentText ) 

E 

A more formal definition of the syntax is given below. The standard notation proposed by 
Niklaus Wirth^ is used: production rules use equals = to relate identifiers to expressions, 
vertical bar | for or, and double quotes " " around terminal characters; curly brackets {} 
indicate repetition any number of times including zero; square brackets [] indicate optional 
factors (i. e. zero or one repetition); parentheses () are used for grouping; rules are 
terminated by period. Note that the syntax allows blanks before and after commands, and 
blanks or other kinds of separators (almost any character) before integers, etc. The syntax 
reflects the fact that symbol definitions may not nest. 


Command 

Polygon with a path 

Box with length, width, center, and direction 
(direction defaults to (1,0) if omitted)- 
Round flash with diameter and center 
Wire with width and path 

Layer specification 

Start symbol definition with index, a, b 
(a and b both default to 1 if omitted) 
Finish symbol definition 
Delete symbol definitions 
Call symbol 

User extension 
Comments with arbitrary text 
End marker 


Cif File 
command 

primCommand 


{ { blank ) [ command ] semi ) endCommand { blank }. 
primCommand | defDeleteCommand I 

defStariCommand semi ( { blank } [ primCommand ] semi } defFinishCommand. 
polygonCommand | boxCommand | roundFlashCommand | wireCommand | 
layerCommand | callCommand I userExtensiorCommand ! commentCommand. 


polygonCommand = 

boxCommand = 

roundFlashCommand = 

wireCommand - 

layerCommand = 

defStartCommand = 

defFinishCommand 
defDeleteCommand = 

callCommand = 

UserExlensionCommand = 
commentCommand = 

endCommand = 


"P” path. 

"B" integer sep integer se 
"R" integer sep point. 

"W" integer sep path. 

"L" ( blank } shortname. 
"D” { blank } "S" integer | 
"0” { blank } "F". 

"D" { blank } "D" integer. 
"C" integer transformation, 
digit userText. 

"(" commentText ")". 

"E". 


point [ sep point ]. 


sep integer sep integer ]. 



[Ch.4, Sect.2: C!F by R. F. Sproull and R. F. Lyon ]<Conway>imp2.vlsi January 28, 1978 10:16 AM 


transformation 

= 

{ { blank } ( "T" point [ "M" { blank } "X" I "M" { blank } "Y" | "R" 

path 

= 

point ( sep point }. 

point 

= 

slnteger sep slnteger. 

slnteger 

= 

{ sep } [ ] integerD, 

integer 

= 

{ sep } integerD. 

integerD 

- 

digit { digit }. 

shortname 

- 

C [ C ] [ c ] [ c ]. 

c 

= 

digit | upperChar. 

userText 

= 

{ userChar ]. 

commentText 

= 

{ commentChar } | commentText "(" commentText ")" commentText. 

semi 

s 

{ blank } { blank }. 

sep 

= 

upperChar | blank. 

digit 

= 

"0" | "1" | "2" | "3" I " 4 ” | "5" 1 " 6 " | "7" 1 ’'8" | "9". 

upperChar 

- 

•'A" | "B" | "C" | ... | "Z". 

blank 

- 

any ASCII character except digit, upperChar, or 

userChar 

- 

any ASCII character except 

commentChar 

- 

any ASCII character except "(" or 


Semantics 

The fundamental idea of the intermediate form is to describe unambiguously the geometry 
of patterns for LSI circuits and systems. Consequently, it is important that all readers and 
writers of files in this form have exactly the same understanding of how the file is to be 
interpreted. Many of the decisions in designing the file format were made to avoid 
ambiguity or small but troublesome errors: floating point numbers are avoided; there are no 
iterative constructs, though there may be in future additions to C1F. 


A simple file format might include only primitive geometric constructs, such as polygons, 
boxes, flashes and wires. Unfortunately, the geometric description of a chip with hundreds 
of thousands of rectangles on it would require an immense file of this sort. Consequently, 
we have made provision for defining and calling symbols; this should reduce the size of the 
file substantially. 


It is important that programs processing CIF files operate cautiously, maintaining a constant 
vigilance for mistakes or entries that will not be processed properly. The description below 
mentions implementation suggestions or cause for caution inside brackets []. 
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Measurements. The intermediate form uses a right-handed coordinate system shown in 
Figure 19a, with x increasing to the right and y increasing upward. (Directions and 
distances are always interpreted in terms of the front surface of the finished chip, not in 
terms of the various sizes and mirrorings of the intermediate artifacts.) The units of 
distance measurement are hundredths of a micron (/am); there is no limit on the size of a 
number. [Programs reading numbers from OF files should check carefully to be sure that the number does 
not overflow the number of bits in the internal representation used, and should specify their own limits, if 
any.] 

Directions. Rather than measure rotation by angles, C1F uses a pair of integers to specify a 
"direction vector.” (This eliminates the need for trigonometric functions in many 
applications, and avoids the problem of choosing units of angular measure.) The first 
integer is the component of the direction vector along the x axis; the second integer along 
the y axis. Thus a direction vector pointing to the right (the +x axis) could be represented 
as direction (1 0), or equivalently as (17 0); in fact, the first number can be any positive 
integer as long as the second is zero. A direction vector pointing NorthEast (i.e., rotated 45 
degrees counterclockwise from the x axis) would have direction (1 1), or equivalently (3 3), 
and SO on. [A (0 0) direction vector may be defaulted to mean the +x axis; a warning should be generated]. 

Geometric primitives. The various primitives that specify geometric objects are not intended 
to be mutually exclusive or exhaustive. CIF may be extended occasionally to accommodate 
more exotic geometries. At the same time, it is not necessary to use a primitive just because 
it is provided. Notice in the examples below that lower case comments and other characters 
within a command are treated as blanks, and that blanks and upper case characters are 
acceptable separators. 


Boxes'. Box Width 60 Length 25 Center 80,40 Direction -20,20; (or B60 .25 80 40 -20 20;) 

The fields which define a box are shown graphically in Figure 19a. Center and direction 
(optional, defaults to +x axis) specify the position and orientation of the box, respectively. 
Length is the dimension of the box parellel to the direction, and Width is the dimension 
perpendicular to the direction. 

Polygons'. Polygon A 0,0 B 10,20 C -30,40; (or PO 0 10 20 -30 40;) 

A polygon is an enclosed region determined by the vertices given in the path, in order. For 
a polygon with n sides, n vertices are specified in the path (the edge connecting the last 
vertex with the first is implied; see Figure 19b). [Programs that try to interpret polygons may place 
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various restrictions on their paths; no set of constraints has been generally accepted, and no program currently 
exists for converting completely general polygons to pattern generator output.] 

Flashes'. RoundFlash Diam 200 Center -500,800; (or R200 -500 800;) 

The diameter of a flash is sufficient to specify its shape, and the center specifies its 
position, (see Figure 19b). [Some programs may substitute octagons, or other approximations, for round 
flashes.] 

Wires: Wire Width 50 A 0,0 B 10,20 C -30,40; (or W50 0 0 10 20 -30 40;) 

It is sometimes convenient to describe a long, uniform width run by the path along its 
centerline. We call this construct a wire (see Figure 19b). An ideal wire is the locus of 
points within one half-width of the given path. Each segment of the ideal wire therefore 
includes semicircular caps on both ends. Connecting segments of the wire is a transparent 
operation, as is connecting new wires to an existing one: the semicircular overlap ensures a 
smooth connection between segments in a wire and between touching w'ires. [For output 
devices that have a hard time constructing circles, we approximate the ideal wire with squared-off ends. Notice 
that squared-off ends work nicely for segments meeting at right angles, but cause problems if wires or wire 
segments are connected at arbitrary angles. A way to circumvent this problem is to convert, prior to output, any 
wires in a file into connected sets of boxes of appropriate length, width, angle and center position (Figure 19c). 
The width of each box is the same as the width of the wire. The length of the boxes must be adjusted to 
minimize unfilled wedges and overlapping "ears". An algorithm for constructing boxes from a wire description 
is given in a later subsection. If the wire is specified within a symbol definition, the approximation need be 
computed only once, and can then be used each time the symbol is instantiated.] 

Layer specification : Layer ND nmos diffusion; (or LND;) 

Each primitive geometry element (polygon, box, flash, or wire) must be labeled with the 
exact name of a fabrication mask on which it belongs. Rather than cite the name of the 
layer for each primitive separately, the layer is specified as a "mode" that applies to all 
subsequent primitives, until the layer is set again (layer mode is preserved across symbol 
calls, which are discussed later). 

The argument to the layer specification is a short name of the layer. Names are used to 
improve the legibility of the file, and to avoid interfering with the various biases of 
designers and fabricators about numbers (one person's "first layer” is another's "last"). [The 

imemiun of the layer specification command is to label locally the layer for a particular geometry, It is 
therefore senseless to specify a box. wire, polygon or flash if no layer has been specified. In order to detect 
this error, the command LZZZZ is implicitly inserted at the beginning of the file, and as the first command of 
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a symbol definition (OS; see below). Any attempt to generate geometric output on layer ZZZZ will result in an 
error.] 


It is important that layer names be unique, so that combining several files in intermediate 
form will not generate conflicts. The general idea is that the first character of the name 
denotes the technology, and the remainder is mnemonic for the layer. At present, the 
following layers are defined: 

ND NMOS Diffusion 

NP NMOS Polysiiicon 

NC NMOS Contact cut 

NM NMOS Metal 

Nl NMOS depletion mode Implant 

NB NMOS Buried contact 

NG NMOS overGlass openings 


New layer names will be defined as needed. 


[Programs that read CIF will want to check to be sure that layer names used do in fact correspond to 
fabrication masks being constructed. However, the file may cite layer names not used in a particular pass over 
the CIF file. It would be helpful for the program to provide a list of the layer names that it ignored.] 

Symbols. Because many LSI layouts include items that are often repeated, it is helpful to 
define often-used items as "symbols.” This facility, together with the ability to "call" for an 
instance of the symbol to be generated at a specific position, greatly reduces the bulk of the 
intermediate form. 

The symbol facilities are deliberately limited, in order to avoid mushrooming difficulties of 
implementing programs that process CIF files. For example, symbols have no parameters; 
calling a symbol does not allow the symbol geometry to be scaled up or down; there are no 
direct facilities for iteration. The main reason for symbol facilities is to limit the file size; 
if the symbol mechanism is not adequate for some application, the desired geometry can still 
be achieved with less use of symbols, and more use of explicit geometrical primitives. 
[Symbols need not be used at all; this eliminates the need for intermediate storage for symbol definitions, but 
results in larger design files. Machines which must process a fully-instantiated representation of a layer (such 
as pattern generators) might only accept CIF files without symbol definitions, to reduce the cost of 
implementation. Therefore, it would be useful to have a program that would convert general CIF files to fully 
instantiated CIF files, and maybe to sort by layer, location, or whatever.] 


The ability to call for iterations (arrays) of symbols is not provided in CIF Version 2.0. 
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This is primarily due to the difficulty of defining a standard method of specifying 
iterations, without introducing machine-dependent computation problems. It is still 
possible to acheive a great deal of file compaction by defining several layers of symbols (e.g. 
cell, row, double-row, array, etc.). However, the ability to iterate symbol calls is a likely 
prospect for a future addition to C1F. 

Defining symbols : Definition Start #57 A/B = 100/1; ... ; Definition Finish; (or DS57 1 00 1; ... ;DF;) 

A symbol is defined by preceding the symbol geometry with the DS command, and 
following it w'ith the DF command. The first argument of the DS command is an 

identifying symbol number (unrelated to the order of listing of symbol definitions in the 

file). 

The mechanism for symbol definition includes a convenient way to scale distance 
measurements. The second and third arguments to the DS command are called a and b 
respectively. As the intermediate form is read, each distance (position or size) measurement 
cited in the various commands (polygons, boxes, flashes, wires and calls) in the symbol 
definition is scaled to (a*distance)/b. For example, if the designer uses a grid of 1 micron, 
the symbol definition might cite all distances in microns, and specify a=100, b=1. Or the 
designer might choose lambda (characteristic fabrication dimension) as a convenient unit. 
This mechanism reduces the number of characters in the file by shrinking the integers that 
specify dimensions and may improve the legibility of the file (it does not provide scaling, or 
the ability to change the size of a symbol called within the definition). 

Definitions may not nest. That is, after a DS command is specified, the terminating DF 

must come before the next DS. The definition may, however, contain calls to other 
symbols, which may in turn call other symbols. 

There is only one restriction on the placement of symbol definitions in the file: a symbol 
must be defined before its instantiation becomes necessary. This constraint can be satisfied 
by placing all symbol definitions first in the file, and then calls on the symbols. In fact, it 
is often convenient to have the file consist exclusively of symbol definitions and ONE call 
on a symbol. This call will be the last command in the file before the end command. [If a 
definition redefines a symbol that already exists, the previous definition is discarded; a warning message should 
be generated. When several people contribute to a design, some s\mboi management is therefore necessary; see 
Deleting symbol definitions below.] 
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Calling symbols : Call Symbol #57.Mirrored in X Rotated to -1,1 then Translated to 10,20; 

The C command is used to cal! a specified symbol and to specify a transformation that 
should be applied to all the geometry contained in the symbol definition. The call 
command identifies the symbol to be called with its "symbol index," established when the 
symbol was defined. 

The transformation to be applied to the symbol is specified by a list of primitive 
transformations given in the call command. The primitive transformations are: 

T point Translate the current symbol origin to this point. 

M X Mirror in X, i.e., multiply X coordinate by -1. 

M Y Mirror in Y, i.e., multiply Y coordinate by -1. 

R point Rotate symbol's x axis to this direction. 

Intuitively, each coordinate given in the symbol is transformed according to the first 
primitive tranformation in the call command, then according to the second, etc. Thus "Cl 
T500 0 MX” will first add 500 to each x coordinate from symbol 1, then multiply the x 
coordinate by -1. However, "Cl MX T500 0” will first multiply the x coordinate by -1, and 
then add 500 to it: the order of application of the transformations is therefore important. 
In order to implement the transformations, it is not necessary to perform each primitive 
operation separately; the several operations can be combined into one matrix multiplication 
(see the subsection on transformations). 

Symbol calls may nest; that is, a symbol definition may contain a call to another symbol. 
When calls nest, it is necessary to "concatenate" the effects of the transformations specified 
in the various calls (see the subsection on transformations). [There is no sensible way in which a 
symbol may be invoked recursively (i.e., call itself, either directly or indirectly). Programs that read the 
intermediate form should check that no recursion occurs. This can be achieved by retaining a single flag with 
each symbol to indicate whether the symbol is currently being instantiated; the flags are initialized to "false.” 
When a symbol is about to be instantiated, we check the flag; if it is "true,” we have detected recursion, print 
an error message and do not perform the call. Otherwise, we mark the flag "true," instantiate the symbol as 
specified, and mark the flag "false" when the instantiation is complete.] 

Layer settings are preserved across symbol calls and definitions. Thus, in the sequence: 

LNM; 

S6 20 0; 

C 57 T45 13; 

DS 114...; 

DF; 

LNM; 

S3 0 O; 
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the second LNM is not necessary, regardless of the specification of symbols 57 and 114. 


Deleting symbol definitions : Deiete Definitions greater than or equal to 100; (or DD100;) 

The DD command signals the program reading the file that all symbols with indices greater 
than or equal to the argument to DD can be "forgotten” — they will not be instantiated 
again. This feature is included so that several intermediate form files can be appended and 
processed as one. In such a case, it is essential to delete symbol definitions used in the first 
part of the file both because the definitions may conflict with definitions made later and 
because a great deal of storage can usually be saved by discarding the old definitions. 


The argument to DD that allows some definitions to be kept and some deleted is intended to 
be used in conjunction with a standard "library” of definitions that a group may develop. 
For example, suppose we use symbol indices in the range 0 to 99 for standard symbols 
(pullup transistors, contacts, etc.) and want to design a chip that has 2 student projects on it. 
Each project defines symbols with indices 100 or greater. The CIF file will look like: 

/Definitions of library symbols; 

DS 0 100 1; 

/ ...definition of symbol 0 in library; 

DF; 

DS 1 100 1; 

/ ...definition of symbol 1; 

DF; 

/ ...remainder of library; 

/Begin project 1; 

DS100 100 1; 

/ ...first student's first symbol definition; 

DF; 

DS109 100 1; 

/ ...first student's main symbol definition; 

DF; 

Cl09 T403 -110;/ call on first student's main symbol; 

DD100;/Preserve only symbols 1 to 99; 

/Begin project 2; 

DS100 100 1; 

/ ...second student's first symbol definition; 

DF; 

DS113 100 1; 

/ ...second student’s main symbol definition; 

Cl T-3 45;/Call on library symbol, still available; 

DF; 

Cl 13 T401 0:/ call on second student's main symbol; 

E 


User expansion : 3 ’SYMBOL.LIBRARY'; 5 :NONSTANDARD DESIGN RULES: LAMBDA = 4.0; 



[Ch.4, Sect.2: OF by R. F. Sproull and R. F. Lyon ]<Con\vay>imp2.v1si January 28, 1978 10:16 AM 


Several command formats (any command starting with a digit) are reserved for expansion 
by individual users; the authors of the intermediate form agree never to use these formats in 
future expansions of the standard format. For example, private expansions might provide 
for (1) requesting that another file be "inserted" at this point in the processing, thus 
simplifying the use of symbol libraries; (2) inserting instructions to a preprocessor that will 
be ignored by any program reading only standard intermediate form constructs; or (3) 
recording ancillary information or data structures (e.g., circuit diagrams, design-rule check 
results) that are to be maintained in parallel with the geometry specified in the style of the 
intermediate form. 

Comments'. (HISTORY OF THIS DESIGN:); 

The comment facility is provided simply to make the file easier to read, [it is possible to 
deactivate any number of commands by simply enclosing them within a pair of parentheses, even if they already 
include balanced parentheses.] 

End Command : End of file. 

The final E signals the end of the CIF file. [Programs that read OF should gave an error message if 
the file ends without an End Command, or a warning if more text other than blanks follows the E.] 
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Transformations (see also reference R2) 

When we are expanding a symbol, we need to apply a transformation to the specification of 
an item in the symbol definition to get the specification into the coordinate system of the 
chip. There are three sorts of measurements that must be transformed: distances (for 
widths, lengths), absolute coordinates (for "points” in all primitives) and directions (for 
boxes). 


Distances are never changed by a symbol call, because we allow no scaling in the call. Thus 
a distance requires no transformation. 


A point (x,y) given in the symbol is transformed to a point (x',y') in the chip coordinate 
system by a 3x3 transformation matrix T: 

[x' y' 1] = [X y 1] T 

[It is a good idea to check either the last column of T, or the 1 at the end of the transformed vector, even 
though they never need to be computed.] 


T is itself the product of primitive transformations specified in the call: T = T1 T2 T3, 
where T1 is a primitive transformation matrix obtained from the first transformation 
primitive given in the call, T2 from the second, and T3 from the third (of course, there may 
be fewer or more than 3 primitive transformations specified in the call). These matrices are 
obtained using the following templates for each kind of primitive transformation: 


Tab. 


M X. 


M Y. 


R a b. 


Tn = 1 0 0 

0 1 0 

a b 1 


Tn = 


-1 

0 

0 


0 0 
1 0 
0 1 


Tn = 1 0 

0 '-1 

0 0 


0 

0 

1 


Tn = 


a/c b/c 

-b/c a/c 

0 0 


0 

0 where c = Sqrt(a^+b^) 
1 


Transformation of direction vectors (x y) is slightly different than the transformation of 
coordinates. We form the vector [x y 0], and transform it by T into the new vector [x' y’ 
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0]. The transformed direction vector is simply (x* y’). [Note that some output devices may require 
rotations to be specified by angles, rather than direction vectors. Conversion into this form may be delayed 
until necessary to generate the output file. Then we calculate the angle as ArcTan(y/x), applying care when 
x=0.] 

Nested calls require that we combine the transformations already in effect with those 
specified in the new' call. Suppose we are expanding a symbol a, as described above, 
transforming each coordinate in the symbol to a coordinate on the chip by applying matrix 
Tac. Now we encounter, in a's definition, a call to b. What is to happen to coordinates 

specified in b? Clearly, the transformations specified in the call will yield a matrix Tba 

that will transform coordinates specified in symbol b to the coordinate system used in 
symbol a. Now these must be transformed by Tac to convert from the system of symbol a 
to that of the chip. Thus, the full transformation becomes 

[x' y* 1] = [x y 1] Tba Tac 

The two matrices may be multiplied together to form one transformation Tbc = (Tba Tac) 
that can be applied to convert directly from the coordinates in symbol b to the chip. This 
procedure can be carried to an arbitrary depth of nesting. 

To implement transformations, we proceed as follows: we maintain a "current 

transformation matrix" T, which is initialized to the identity matrix. We use this matrix to 
transform all coordinates. When we encounter a symbol call, we: 

1. "Push" the current transformation and layer name on a stack. 

2. Set layer name to ZZZZ. 

3. Collect the individual primitive transformations specified in the call into the 
matrices T1, T2, T3 etc. 

4. Replace the current transformation T with T1 T2 T4 ... T; i.e., premultiply 
the existing transformation by the new primitive transformations, in order). 

5. Now process the symbol, using the new T matrix. 

6. When we have completed the symbol expansion, "pop" the saved matrix and 
layer name from the stack. This restores the transformation to its state 
immediately before the call. 
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Decomposing Wires Into Boxes 


The following algorithm for decomposing wires into boxes was developed by Carver 
Mead, and first implemented at Caltech by Ron Ayers; it was was further modified to be 
consistent with the use of direction vectors, to allow more general path lengths, and to 
avoid use of trigonometric functions. [Note that this decomposition covers more area than the 
locus of points within w/2 of the path for small angles of bend, but less area for sufficiently sharp bends; 
in particular, if a path bends by 180 degrees (reverses) it will have no extension past the point of reversal (it 
is missing a full semicircle). Other decompositions are possible, and may better approximate the correct 
shape.] 

Let the wire consist of a path of n points p 1 ,...,p n . 

Let w represent the width of the wire. 

"Initialization:" 

iF n = 0 then done; "no path" 
if n = 1 THEN 

{MAKEFLASK[Diameter «- w, Center «- pj ]; "single-point gets a flash"; 
done;}; 
i *- 1; 

OldEntension «- w/2; "initial end of wire" 

Segment <- p 2 - p 3 ; "pj and p 2 are points in path, Segment is a vector (a point)" 
"LoopConditions:" 

FOR pj, p i+1 in path until p j+] is last DO 

"calculate the box for the segment from pj to p j + 1 :" 

if p i + ] is last then { Entension «- w/2; "final end of wire" } 

ELSE 

{ "compute Entension for intermediate point:" 

NextSegment <- p j+2 - p i+1 ; "next vector in path" 

T «- matr ix[ x[Segment], -Y[Segment], 
y[Segment], x[Segment] ]; 

"T transforms Segment to +x axis." 

Bend «- multiply '[ NextSegment, T ]; "relative direction vector" 

"if Bend is (0 0), delete p i+] , reduce n, and start over" 

Entension *- w/2 * ( ABs[Y[Bend]] / 

( LENGTH[Bend] + ABs[x[Bend]] ) ); 

}; 

Makebox [ { Length «- LENGTH[Segment] + Entension + OldEntension; }, 

{ Width «- w; }, 

{ Center <- (pj + p i+1 )/2 + ( Segment / LENGTH[Segment] ) * 
(Entension - OldEntension)/2; }, 

{ Direction «- Segment; "careful, may be zero vector" } ]; 

i «- i + 1; 

OldEntension «- Entension; 

Segment <- NextSegment; "next vector in path" 

endloop; 

done; 
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Fig. 19a. Box Representation in Intermediate Form 



Fig. 19b. Other Items in the Intermediate Form 


T transforms Segment to the +X axis 



Simitar triangles BCD, EFG, BFH 
BC:CD:DB :: EF:FG:GE :: BF:FH:HB 
FG = FB + BG 

= BH * (BC/DB) + BG 
= (1 + BC/DB) * BG 
BG = FG / (1 + BC/DB) 

= GE * (BC/DB) / (1 + BC/DB) 

= GE * (CD / (DB + BC)) 
w/2 * Y[Bend] / (LENG1 H[Bend] + X[Bend]) 


fig. 19c. Converting Wires to Boxes 


(cif 19.press) 
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The Multi-Project Chip 

Insight into integrated system design is most quickly gained by actually carrying through to 
completion several LSI design projects, each of increasing scope. A large, complex VLSI 
system could be quickly and successfully developed by designers able to easily implement 
and test prototypes of its subsystems. The separate subsystems can be implemented, tested, 
debugged, and then merged together to produce the overall system layout. However, such 
activities are only practical if a scheme exists for carrying out implementation with 
minimum turnaround time and low procedural overhead per project. 

In this section we describe procedures for organizing and implementating many small 
projects by merging their layouts onto one multi-projecl chip. Then each designer of a 
small project or subsystem need not carry the entire procedural burden involved in 
maskmaking and fabrication. We also include a collection of practical tips and hints that 
may prove useful to those undertaking their first projects or organizing their first multi 
project chips. While the details in this section are specific to present maskmaking and 
fabrication technology, they nevertheless give a feeling for the sort of things that must be 
done to implement projects in general. In a later section we discuss how multiple project 
implementation might be done in the future. 

Figure 20 contains a photomicrograph of a Caltech class project chip containing 15 separate 
student projects. The individual projects were simply merged together onto one typically 
sized chip layout, approximately 3 mm by 4 mm, and implemented simultaneously as one 
chip type. Most of these projects are prototypes of digital subsystems designed using the 
methodology of this text. By implementing a small "slice" of a prototype subsystem array, 
one can verify that its design, layout, and implementation are correct, and measure its power 
and delay characteristics as yielded by the particular fabrication process, thus gaining almost 
as much information as would be obtained by implementing the full array. 

Following fabrication, the wafers containing such multi project chips are scribed, diced, and 
ihen divided up among the participants. The typical minimum fabrication run is about 10 
wafers, each ~7.5 to 10 cm in diameter. Thus even a minimum run provides a few thousand 
chips, and each participant ends up with many chips. Participants may then each package 
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Fig. 20. Photomicrograph of a Caltech Class Project Chip 
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their chips, bonding the package leads to the contact pads of their individual project. Since 
most such projects are relatively small in area, yields are unusually high: if a project’s design 
and layout have been done correctly, most of the corresponding chips will work. 

Organizing a multi-project chip involves: (i) creating the layout of a starling frame, into 
which the various projects are to be merged, (ii) gathering, relocating, and merging the 
project layouts into the starting frame to create one design file and generating from this the 
PG files for the overall project chip, and (iii) documenting various parameters and 
specifications to be used during maskmaking and fabrication. 

The starting frame contains all the auxiliary portions of the chip layout: scribe lines, 
alignment marks, line width testers (critical dimension marks), and test patterns. The 
starting frame may contain fiducial marks on each mask level if these are not to be placed 
by the mask house, and in some cases may contain a parity mark on each level to mark the 
appropriate reticle side and orientation during step and repeat reduction. A tip: placing a 
mask level name or symbol somewhere within the chip's scribe line boundary on each level 
helps prevent the fatal error of level interchange at some time during project merging, 
maskmaking, or fabrication. 

The contents of this starting frame must be carefully worked out to meet the requirements 
and constraints of the chosen mask house and fab line. The important factor of turnaround 
time for the entire mask and fab sequence may be reduced to some extent by repeatedly 
using a relatively standard starting frame which then becomes familiar to all those involved. 
Some typical values for the time involved: 2 to 3 weeks for maskmaking, and then 3 to 4 
weeks for fabrication, longer if large work queues exist at the mask or fab firms. 

When a multi-project chip is scheduled, a tentative chip partition for each project can be 
negotiated among the participants. Project design and layout can then proceed, with 
iterations on the space allocation being done right up till the final merging. The gathering 
and merging of project layout files into one design file is simplified if they are in a common 
intermediate form . Projects may then be relocated to their respective partitions of the chip, 
displayed, plotted, or otherwise checked, using minimum and consistent software operating 
upon manageable sized files. When the project chip appears correctly organized, pattern 
generator (PG) files are produced and written on a mag tape to be sent to the mask house. 


An alternative to the merging of projects at the intermediate form level, is the relocation 
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and merging of their PG fiies. However, the PG files for major designs, containing fully 
instantiated artwork, become unwieldy in size even at today's complexity. The PG file 
merging scheme is workable for projects of small to moderate size, and does provide a 
contingency plan for including projects having alien intermediate forms. If designs are 
relocated and merged at the PG level, additional software should be provided for displaying 
or plotting the chip at that level, so that merging errors may be spotted. A tip: it is a good 
idea in any case to have some bounds checking to prevent stray items of one project from 
clobbering another. 

A thought: the interface between design groups and mask houses would be cleaner if design 
files in a common intermediate form, such as CIF, rather than PG files were used to 
transmit designs to the patterning process. Files would be much smaller. The use of data 
links would be eased. The process to convert and sort design files into PG files, involving 
patterning mechanism dependent optimization, would be appropriately located: in 
association with the particular patterning mechanism. 


Examples of Multi-Project Chips: 

The above concepts and some further possibilities may be clarified by examining the details 
of some specific examples. Figure 21 illustrates a collaborative Xerox PARC/Caltech 
multi-project chip set [organized by D. Fairbairn, D. Johannsen, R. Lyon, J. Rowson, S. Trimberger], The 
figure was produced as a software blowback from the PG file, of the metal level of this chip 
set. Projects in the set ranged in scope from the test of a few cells of an experimental, low 
power shift register [C. Sequin. U. C. Berkeley, and R. Lyon, Xerox PARC], up to a complete content 
addressible cache memory system [D. Fairbairn]. 

Although several of the projects in the set are fairly large, all were individually designed to 
yield chip sizes packagable in standard 40 pin packages, which can hold chips up to ~ 7 mm 
square. The pattern generator at the intended mask house was a GCA/D.W,Mann 3600, and 
the photorepeater was a Mann 3696. Together, these can produce lOx reticles having field 
sizes as large as 10 cm square, and can reduce, step, and repeat these at a maximum of 10mm 
x,y intervals onto masks. Therefore, the 3600/3696 can provide masks for square chips up 
to 10 mm (10,000jLi) on a side. A 10mm square chip can hold the patterns of several 
normally sized chips. By including interior scribe lines in the starting frame, as indicated 
in figure 21, one reticle set can be patterned on the Mann 3600 to contain a number of 



















































































[Ch.4: Implementing Integrated System Designs: Sect.3]<Coriway>imp3.vlsi January 30, 1978 4:33 PM 


different chips, each of which may contain more than one project. When masks are made, 
each reticle is photorepeated at intervals in x,y corresponding to its outer dimensions minus 
some scribe line overlap. In the example in figure 21, the x,y stepping distances were both 
-9700 microns. Fabricated wafers are scribed and diced on all scribe lines, including the 
interior ones, to yield chips of typical sizes. One of the projects, on the lower left chip in 
figure 21, is an experimental charge coupled device array [R. Davies]. The CCDs rode along 
on this chip set to obtain working masks for use in a completely different process 
technology (triple poly) from the standard nMOS the other projects used. 

Figure 22 provides a higher magnification PG file software blowback of the region near the 
center of the left scribe line of the chip set. Alignment marks and line width testers 
(C/D’s) were placed in this region, as noted in the figure. Software blowbacks of individual 
mask levels, more closely resembling the reticles and masks than would a composite design 
checkplot of all levels, are useful in conveying such location information to the mask and 
fab houses. Parity marks were not needed on the reticles for this project chip set. Fiducial 
marks were placed on the reticles by the mask house. Since the software converting the 
design files to PG files had just been constructed prior to organizing this chip set, reticle 
blowbacks were requested before proceeding further with maskmaking to verify that 
everything through pattern generation had worked correctly. 

Some other practical details: Participants in the chip set shared some of the commonly used 
layout items normally required in any project. Examples were input contact pads with 
attached "lightning arrestor" circuits to protect the input MOSFET gates, and output drivers 
snaked around and attached to output pads. Even at current device sizes, pads occupy a 
large fraction of the chip area for large collections of projects, and participants tend to 
make the pads as small as their bonding skill allows. A square pad ~75/int on a side is a 
rather small bonding target, and 125fim on a side is easier for the novice to hit. Perhaps 
-lOOjum square pads separated by ~75|tim is a good compromise, and these should be at least 
25fim from any other metal lines to avoid shorting the lines when bonding. 

The scribe lines on this chip set were laid out as 140/im wide cuts down to 160^xm wide 
paths on the diffusion level, to provide lanes free of oxide for scribing or sawing. Metal 
paths 30jam wide were then laid out straddling the boundaries of these scribe lines, to 
provide electrical contact front the substrate to the metal during the etching of the metal 
layer. Since all the projects on this chip set were prototype designs, and were not intended 
to be placed in extended use, the chips were not overglassed. Eliminating the overglassing 
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Fig. 22. Close-Up View of Figure 21 Kcgion Containing Mask & Fah Information 
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meant that a mask level for defining cuts through overglassing over the contact pads and 
scribe lines was not needed, reducing maskmaking costs. On the other hand, the chip set 
included a mask level to pattern the thin gate oxide, to provide buried contacts between 
diffusion and poly that do not require metal coverage as does the butting contact. Such 
buried contacts enable more compact layouts, but are subject to a rather complex set of 
design rules, require an extra mask level, and sometimes reduce yield and reliability. 

Deleting the overglassing process step also made it possible to electrically probe interior 
points on the chips during testing, probing small metal test pads included in the layouts. 
Such pads must be placed with care, however, because they hang relatively large capacitances 
onto circuitry and slow it down. Note that test pad probing requires special jigs and a stereo 
microscope, and that it is only possible to directly probe the metal layer. Testing uncovered 
chips may also require reduced light levels. The operation of dynamic circuits, i.e. those 
which use a pass transistor input into a gate having no other electrical connection, can be 
severely affected by light. Light induces leakage currents in the n-p junction between source 
and drain regions and the substrate. At room temperature, charge stored on dynamic nodes 
can be retained for many milliseconds in the absence of light. However, in normal room 
light the retention time is reduced to tens of microseconds. Thus care should be taken to 
avoid high light levels when long clocking periods are used. Dynamic memory chips are 
packaged in opaque black packages because of this effect. 

A software blowback of the metal mask PG file of another project set, organized at Caltech, 
is shown in figure 23. The total area of this multi-project chip set is ~ 1 cm^. It is 
subdivided into four major sections: The lower right quadrant contains the OM2 Data 
Engine described in Chapter 5, layed out using A - 2.5pm. The upper right quadrant 
contains a 16 by 16 bit multiplier with on-board accumulator [by Rod Masumoto, Caltech], also 
using A = 2.5p.m. The lower left quadrant contains a subsystem, laid out using A = 2.9pm, 
which converts output from one port of a computer memory into the red, green, and blue 
analog signals for driving a color TV monitor. The upper left quadrant contains 28 projects, 
mostly from students in an LSI Systems course at Caltech. Other small projects are located 
along the left edge of the multiplier, and in the unused area within the TV subsystem 
project. The source material for this project chip set was generated on three different 
computer systems, in two different languages. Check plotting and viewing were done on 
three other systems. In addition to the Caltech projects, this chip set contains projects from 
Carnegie-Mellon University, Washington University (St. Louis), University of California, 
Irvine, and the Jet Propulsion Laboratory. Approximately 500,000 pattern generator 
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F-'ig. 23. Multi-Project Chip Set Organized at Caltech 
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rectangles were required to pattern the reticles for the five mask levels used in this project 
set. Conversion from intermediate form to PG files required ~10 CPU hours on the Caltech 
DECsystem 20. 

The masks for the multi-project chip sets shown in figures 21 and 23 were produced by 
Silicon Valley mask houses from PG tapes, accompanied by PG file software blowbacks 
showing the locations of auxiliary layout items to be used during implementation, and by 
spec sheets containing a list of mask and fab specifications and parameters. These spec 
sheets contain two types of information: 

(i) that which the mask house will need for reading the PG tape, generating the reticles, and 
stepping the master masks. This includes whether dimensions are in Metric or English 
units, whether fiducials and parity marks have been laid out or are to be placed by the mask 
house, desired reticle magnification (usually 10X, sometimes 5X), the x,y step and repeat 
distances, the type and magnification of reticle blowbacks desired, and whether maskmaking 
beyond reticle generation is to be contingent upon blowback inspection. This information is 
independent of the chosen fab line. 

(ii) that which is specific to the fab line, or lines, on which the wafers will be fabricated. 
Examples here are the number, size, and type of working plates desired, and the 
photographic polarity of the working plates, i.e. whether they are a positive or negative 
image of the PG pattern. The polarity of the working plates depends on the process step 
and on whether positive or negative resist is used. In addition, it is customary to specify 
how much, if any, the lines in the image will be expanded or contracted to compensate for 
growth or shrinkage of regions due to the process. This so-called "pulling" of line widths in 
maskmaking may begin as far back as at pattern generation. Thus, while the patterning and 
fabrication processes are design and layout independent, they are usually coupled, and masks 
made for a run on one fab line are not necessarily useable elsewhere. 

Maskmaking and patterning technology will remain in a state of transition for years to 
come. The present shift is front contact printing with working plates to projection 
alignment using original master masks. These two alternatives are illustrated in figure 24. 
From the system designer's point of view, at the interface to the mask and fab firms, they 
present no essential differences, requiring perhaps slightly different specs, and yielding 
different intermediate artifacts. In the next section we discuss the future evolution of these 
technologies, presenting several implementation schemes likely to become commonplace over 
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Fig. 24. Present Maskmaking and Fabrication: Two Alternatives 
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the next decade. These schemes will enable fabrication of systems much denser and faster 
than present ones. However, the basic concepts of the design methodology will still apply. 
Remembering our film processing analogy, we will have "finer grain" and "faster" film 
available as time passes. However, the basic art of photography remains. 


Patterning and Fabrication in the Future 


As X is scaled down toward its minimum value, ultimately limited by the physics of 
semiconductors to about 0.1/xm, it will become feasible to implement single chip, maximum 
density VLSI systems of enormous functional pow'er. Patterning and fabrication at such 
small values of X requires that certain fundamental problems be overcome^. In this section 
we will discuss alternative solutions to two of the major problems: At values of X of ~2 fim, 
a problem of runout is encountered, causing successive patterning steps to misalign over 
large regions of the wafers. This problem is solved by using less than full wafer exposure. 
At values of X under 0.5 /xm, the wave length of light used in photolithography is too long 
to allow sufficient patterning resolution. This problem is solved by using non-optical 
lithography, exposing the resist with electron beams or x-rays. 

Historically, silicon wafers have been patterned using full wafer exposure, i.e. using masks 
which covered the entire surface of the wafer. The pattern for one layer of one chip is 
stepped and repeated during the fabrication of the mask itself, so that the mask contains the 
patterns for a large array of chips. During the fabrication of each successive layer on the 
wafer, that layer’s mask is aligned at two points with the pattern already on the wafer, and 
the entire wafer then exposed through the mask. In the future, as feature sizes are scaled 
down, full wafer exposure will not likely be possible for reasons developed in this section. 

The earliest integrated circuits, circa 1960, were fabricated using wafers of 2.5 cm diameter, 
and typical chips were 1 to 2 mm, with a minimum feature size of ~25ftm. In 1978, 
production wafers are 7.5 to 10 cm, typical commercially manufactured LSI chips are 5 mm, 
and minimum feature size is ~5/i. The concurrent development of ever finer features sizes 
and larger wafer sizes has placed an increasingly severe strain on the process of full wafer 
exposure. The reasons lie in the physics of wafer distortion. 


When a wafer is heated to a high temperature, it expands by an amount determined by the 
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thermal coefficient of expansion of silicon. A bare wafer will contract exactly the same 
amount upon cooling, and will therefore remain exactly the same size. Suppose, however, 
that a layer of SiO-j is grown on the wafer when it is at the high temperature. The thermal 
coefficient of expansion of Si02 is approximately 1/10 that of silicon. As the wafer is 
cooled, the silicon will shrink at a rate much greater than that of the Sif^. Normally the 
resulting wafer will not be flat, but convex on the SiOj side. If the wafer is cooled slowly 
enough, it is possible to "relieve” the stress induced by the difference in thermal contraction. 
Wafers in which such stress relief has been achieved are nearly flat but are, of necessity, a 
different size than they were originally^ 3 ’* 3 . 

It might seem that subsequent masks could be scaled to just match the wafer distortion 
introduced up to the appropriate point in the process. Unfortunately no such correction can 
be introduced without a knowledge of the pattern of Si0 2 on the wafer. During cooling, 
dislocations are induced in the underlying silicon crystal at the edges of openings in the 
oxide pattern. Hence, the magnitude and direction of wafer distortion is dependent in 
complex ways upon the thickness and distribution of Si0 2 on the surface and upon the 
details of the thermal cycle. While it is in principle possible to compute a geometric 
correction for each pattern to be produced, it is clearly not possible to apply one correction 
for all possible patterns. Misalignment between subsequent layers due to distortion of this 
type is often referred to as runoui. Runout due to wafer distortion is today the largest 
single contributor to misalignment between masking steps. Attempts to use finer feature 
sizes, which require more precise alignment, on larger wafer sizes, which induce larger 
distortions, seem doomed to failure unless full wafer exposure is abandoned. 

Two attractive alternatives to full wafer exposure are now being explored: (i) electron beam 
exposure, and (ii) exposure using step and repeat of the chip pattern directly on the wafer. 

A scanning electron beam system can be used to expose resist material, and is also capable 
of sensing a previous pattern on the surface of a wafer. The beam can initially scan an area 
covering the alignment marks of a particular chip. Information gained from this sensing 
operation can be used to compute the local distortion, and the chip can be exposed in nearly 
perfect alignment using these computed values. The process can be repeated for each chip 
on the wafer, until all have been exposed. 

This technique has several virtues. No masks are required. A digital description of the chip 
can be exposed directly onto a silicon wafer. A different chip can be placed at each chip 
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location, and this opens up the possibility of greatly extending the multi-project chip 
concept. However there are also limitations. Data is transferred serially. Even at the 
highest data rates which can be conveniently generated, a long time is required to expose 
each chip. More fundamentally, the physics of electron beam interactions places severe 
restrictions on the minimum practical feature size attainable. When a beam of electrons 
enters a resist-coated wafer, scattering occurs both in the resist and in the wafer. This 
backscattering contributes a partial exposure at points up to a few microns away from the 
original point of beam impingement, and has a number of implications: 

(i) The exposure, or spatial distribution of energy dissipation, varies with depth in the 
resist. Thus resist cross section is not readily controllable. 

(ii) Exposure at any particular point depends on all patterns exposed within a few microns. 

This is known as the "cooperative exposure” or "proximity” effect and necessitates pattern- 

0 

dependent exposure corrections 0 . 

(iii) Exposure latitude becomes narrower as the spatial period of a pattern is reduced. This 
is illustrated in figure 25, which shows the rise in background level exposure as a function 
of lateral distance for four different spatial periods: (a) 2ftm, (b) lp.m, (c) 0.5/im, (d) 0.3fim. 
The beam diameter is 250 angstrom units, the energy lOkeV, the resist thickness 0.4jtim. The 
consequences of this background rise are particularly troublesome for high-speed, low- 
contrast resists. Experimental results show somewhat greater line broadening than predicted 
by the rnodel^. 

For the above reasons, the writing time and the difficulty of exposing desired geometries 

9 

increase rapidly for linewidths below about 0.5 micron . 

An immediate prospect for achieving feature sizes of 1-2/i.m with large wafers is offered by 
stepping the chip pattern directly on the wafer rather than on a mask. This technique 
avoids the serial nature of the electron beam writing by exposing an entire chip at once. 
Using good optical systems it has been possible for many years to produce patterns with 
feature sizes in the range 1 to 2/im. Recent progress in the design of optical projection 
systems may even make 1/2 to 3/4 micron line width patterns over several millimeter 
diameter areas practical^. Techniques are known for using light to achieve alignments to a 
small fraction of a wavelength. Recently, an interferometric optical alignment technique 
has demonstrated an alignment precision of 0.02 micron and should be capable of a 



Figure 25. Electron Beam Exposure of Resist on Silicon. 
Monte Carlo Calculation of Exposure Level at a Silicon-Resist Interface, 
as a Function of Lateral Distance, for Four Spatial Periods, 
[contributed by H, I. Smith, Lincoln Laboratory, M.I.T.] 
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reregistration uncertainty less than 0.01 micron 11 . It would seem that devices of ultimately 
small dimensions (0.25/i.ni) could be fabricated using optical alignment, it must be stressed 
that a realignment to the underlying pattern must be done at each chip location to achieve 
the real potential of the technique. 

The step-and-align technique can be extended to ultimately small dimensions by 
substituting an x-ray source for the optical one, while retaining the automatic optical 
alignment system. X-rays require a very thin mask support, e.g. Mylar, upon which a heavy 
material such as gold or tungsten is used as the opaque pattern. Interactions of x-rays with 
matter tend to be isolated, local events. No back-scattering of the x-rays occurs, and 
electrons produced when an x-ray is absorbed are sufficiently low in energy that their range 
is limited to a small fraction of a micron. For this reason, patterns formed by x-rays in 
resist materials on silicon wafers are much cleaner and better defined than those attainable 
by any other known technique (see figure 26). X-rays of very high intensity can be 
efficiently obtained from the synchrotron radiation of an electron storage ring. The time 
required for exposing a chip with such a source is no more than that required at present 
using optical exposures. Both optical and x-ray techniques have the property that the total 
exposure time per wafer can be made independant of how much of the wafer is exposed at a 
step. Therefore, the only penalty in a step and align process is the time required for 
mechanical motion and alignment. 

It appears that we have in hand all of the techniques for ultra fine line lithography, even on 
larger silicon wafers. Both electron beam and optical stepping work must, however, focus 
on local alignment as the crucial step in achieving high density, high performance LSI. 

We now describe a production lithography system for ultimately small dimensions. A major 
component of the system is a 500 to 700 MeV electron storage ring, approximately 5 meters 
in diameter, shaped in the form of a many sided polygon. The electron beam within this 
storage ring is deflected at each vertex by a superconducting magnet. This deflection results 
in a centripetal acceleration of the electrons, and hence in an intense tangential emission of 
synchrotron radiation. The most important component of such radiation is soft x-rays in 
the 280 to 1000 eV quantum energy range (wavelengths of 0.004 to 0.001 /ini). Such x-rays 
are idea! for exposing resist materials with line widths in the 0.1/xni range ’ . 

One exposure station is fitted to each vertex of the storage ring. Each exposure station has 
an automatic optical alignment system for individual alignment of each chip 11 . Coarse 
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alignment is controlled by a laser interferometer and the wafer brought into position by 
ordinary lead screws moving a conventional stepping stage such as those in current 
photorepeaters. Auxiliary alignment features are placed on each mask level within each 
chip. Misalignment of two such patterns on the wafer relative to those in the mask 
produces Moire patterns which are detected by photosensors and fed to a computer system. 
Piezoelectric transducers driven by the computer system bring the w'afer into final 
alignment under the mask. Each exposure station in such a system is capable of aligning 
and exposing one layer of one chip every few seconds. Each chip may contain of the order 
of 10^ devices, which is the equivalent of several wafers at today's scale. 

An overview of the possible routes from design files to finished chips with sub-micron 
layout geometries is shown in figure 27. In the immediate future, alignments much better 
than those achievable today will be possible with the optical step and align technique 
(leftmost path in figure 27). In addition, this scheme eliminates the step and repeat process 
in mask making, enabling considerably shorter turnaround time. The rightmost path, direct 
electron beam writing on the wafer, promises the ultimate in short turnaround time. It can 
be viewed as using the fab area as a computer output device. For high volume 
manufacturing, at ultimately small dimensions, the center path as described above will most 
likely become the workhorse of the industry. 
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Fig. 2 7 Some Future Patterning and Fabrication Alternatives 
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Fully Integrated, Interactive Design Systems 
{ in preparation } 

- - - creating a data structure which allows the various levels of interactive processes to 

operate on the same data base - nodes, transistors, cells, and instances - operations 

on the data base - - - interactive logic transfer function tests - - - interactive circuit 
transfer function tests - interactive design rule checking - the filing problem - 


System Simulation, Test Generation, and Testing 
{ in preparation } 

- - - system-level/register-transfer-level design description and simulation - - - testing 

the system design - practical strategies for structured VLSI system development - 

designing for testability - - - generation of test sequences - - - testing the chips - 
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Chapter 5: The Design of a Data Processing Engine 

Copyright © 1978, C.Mead, L.Conway 

Sections: 

The Overall Structure-The Arithmetic Logic Unit-ALL' Registers-Buses- 

Barrel Shifter - - - Register Array - - - Communication with the Outside World - - - 
Machine Operation Encoding - - - functional Specification of the Machine 


Up to this point, we have chosen simple examples to illustrate the fundamental properties of 
integrated systems, and the type of design methodology which can be used to build 
hierarchically organized, complex systems. In order to more fully clarify some of these 
techniques, we will now study the design of a simple data processing structure: the data path 
from a microprogrammed 16-bit machine, undertaken as a university project in 
experimental computer architecture. The "Our Machine" (OM) project was started in 1976 
by Carver Mead as part of the LSI Systems course at Caltech. Early contributions were 
made by Mike Tolle [Litton industries], while attending this course. Other participants were 
Caltech students Dave Johannsen and Chris Carrol, w'ith much inspiration from Ivan 
Sutherland. By December 1976, a first design (OM 0 ) was nearly completed. The 
participants decided at that time that the design had become "baroque" and ugly, and it was 
scrapped. A new design (OM1) w f as completed by March 1977 by Dave Johannsen, Chris 
Carrol, and Rod Masumoto. Fabricated chips were received in June 1977. It was this chip 
w'hich appeared in the September 1977 Scientific American article by Sutherland and Mead. 
The chip was fully functional except for a timing bug in the dynamic register array (which 
had been designed in departure from the structured design methodology developed in this 
text). A complete redesign of the chip was undertaken in June 1977, by Dave Johannsen. 
By September, a complete set of new cells had been constructed, and the design was 
completed by December. Cells from this chip and its companion, the controller chip 
described in chapter 6, were used as examples in chapter 3. The redesign included 
improvements in the encoding of the microcode control word, and rigorously applied the 
structured design methodology. 

The chapter is presented in two separate parts. The first part outlines the architectural 
requirements for the chip, and illustrates how the design methodology was applied to satisfy 
them. The second part is a precise functional description of the chip, intended as a user 
manual for those who microprogram the machine. A more complete discussion of the 
overall system architecture is given in chapter 6. 
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The Overall Structure 

The basic requirements initially established for the machine were that it be gracefully 
interconnectable into multiprocessor configurations, that it be microprogrammable, so that 
OP code sets can be configured to the application at hand, that it be able to do variable 
field operations for emulation instruction decoding, assembly of bit-maps for graphics, etc., 
and that its performance be as fast as possible. 

In order to satisfy the first requirement, it was decided that the machine would initially 
have two ports: one to be used for a system interconnection, and the other for local 
memory, I/O, etc. It was perceived that in many systems much time is lost in assembling 
two operands for most operations, so it was decided that the machine have two internal 
buses, and that any registers in the machine be two-port registers. The requirement for 
gracefully handling variable length words required a shifter at least sixteen bits long, and the 
last requirement dictated an arithmetic logic unit of considerable flexibility while not 
sacrificing speed. The strategy was adopted initially that the two buses would run through 
the actual processing array, from one end of the chip to the other. One port was to be 
located at the left end of the chip, and the other port at the right end, and the two system 
buses were to run the full length of the chip between the two ports through the actual 
register and data processing array. 

The three main central, functional blocks in the machine were the register array, the shifter, 
and the arithmetic logic unit. It was decided to run the control lines vertically in metal, 
and the buses horizontally in polysilicon, and that power, ground, and timing signals would 
run parallel to the control signals. At this point, it is already possible to make a rather 
detailed sketch of the general layout of the chip. This arrangement is shown in Fig. 1. The 
details of these functional blocks will be described in subsequent sections. Included are 
descriptions of peripheral circuits needed to interface subsystems with each other and to the 
outside world. 
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The Arithmetic Logic Li nit 

il was believed that the carry chain would limit the performance of the system and 
therefore, the carry chain and its associated logic w'as the first functional block to be 
designed in detail. Simulations of several look-ahead schemes indicated that they added a 
great deal of complexity to the system without much gain in performance. For this reason 
it was decided early in the project to implement the fastest possible Manchester type carry 
chain ( reference 4, chapter 1), similar to that shown in chapter 1, figure 11. The carry 
chain and its associated logic were allowed to dictate the repeat distance of the cells in the 
vertical direction. In MOS technology, a Manchester carry chain is particularly limited in 
its ability to propagate a high carry signal. However, it is quite fast in its ability to 
propagate a low carry signal. 

In any arithmetic logic unit there will be a null period when the OP code for the next 
operation is being brought in. Advantage can be taken of this null period to precharge the 
carry chain and other sections of the machine where Liming is particularly crucial. In this 
way, it is not necessary to propagate high signals through pass transistors where the rise 
transient W'ouid be particularly slow. It was decided to apply this strategy to OM’s ALU, and 
the resulting carry chain is shown in Fig. 2. 

The main carry chain runs through the pass transistor from carry-in to carry-out. The 
carry-in signal is detected by the gate of an inverter which feeds the signal into the 
subsequent logic of the ALU. Three transistors are used to control the state of the carry-out 
of each stage. The first one merely precharges the node associated with carry-out during the 
null period of the ALU. The second is the carry-kill signal w'hich is derived from the 
inputs to the ALU, and simply grounds the carry-out through a single transistor. The third 
is a pass transistor which causes carry-out to be equal to carry-in. These last two signals 
associated with the carry chain in each stage, carry-kill and carry-propagate, are generated 
by two NOR gates which have kill-bar and propagate-bar as one input and precharge as the 
second input. Hence, it is assured that the kill signal and propagate signal are disabled 
during the null period w'hen the precharging takes place. 

After some analysis, it was decided that nearly all interesting combinations of carry-in and 
the input signals could be generated using propagate and carry-in from each stage. Thus the 
carry-chain itself may be viewed as a logic block with two inputs, carry-kill bar and carry- 
propagate bar, two outputs, propagate and carry-in, a vertical signal carry-in and carry-out. 



[Ch.5., Sect 1} 



Figure 2. Carry Chain Circuit for the Arithmetic Logic Unit. 
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Figure 3. Abstraction of the Carry Chain Circuit. 
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4 


and one control wire, precharge, as shown in Fig. 3. 

The task of designing the balance of the ALU is now reduced to that of designing 
functional blocks to; a) combine the two input variables to form a propagate bar and kill 
bar, b) to combine carry bar and propagate to form the output signal, and c) drivers for 
controlling the logical function blocks and deriving a timing for precharge. 

A number of random logic implementations of function blocks for deriving kill, propagate, 
and the output were attempted. All seemed to be at variance with the horizontally 
microprogrammed architecture of the machine, and required a large amount of area and 
power. For this reason it was decided to use the general logical function block illustrated in 
chapter 3, figure 12a. Such circuits are used to generate carry-bar, propagate-bar, and for 
combining carry-bar in and propagate to form the output. The circuit implements sixteen 
logic functions of two input variables, and is shown in Fig. 4. It consists of a set of 
transistors which fully decode the input combination of A and B, and connect one and only 
one of the vertical control lines to the output, depending on this input combination. Thus, 
for example, when A and B inputs are both low, the vertical control wire labelled G 0 is 
connected to the output. The truth table entries for the desired logic function are placed on 
the G vertical control wires, and the output will then be the desired logic function of the 
two input variables. For example, if the Exclusive-OR of A and B is desired, a logic-0 will 
be applied to the control wires 0 and 3, and logic-1 will be applied to control wires 1 and 2. 
Since it is desired to implement the same logic function on all bits of the word, the control 
variables G 0 through G 3 need not be generated in every bit slice, but may be generated once 
at either the top or bottom of the array. The functional abstraction of the circuit of Fig. 4 
is shown in Fig, 5. 

We are now in a position to form the block diagram for our complete arithmetic logic unit, 
as shown in Fig. 6. The functional dependence of the output on the two inputs and the 
state of the carry is determined by a 12-bit number: P Q through P v K Q through K v and R 0 
through R v together with the carry-in to the least significant bit of the ALU. The ALU is 
quite general, and its detailed operation set may be left unbound until the control structure 
of the machine is designed at a later time. 

There are two general principles illustrated by this design. First, it is often less expensive in 
area, time, and power to implement a general function than to implement a specific one. 
Secondly, if a general function can be implemented, the details of its operation can be left 
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unbound until later, and hence, provide a much cleaner interface to the next level of design. 
The detailed choices of which functional entities to leave unbound and which to bind early 
requires a considerable amount of judgment, and is where much of the skill in integrated 
system design lies. 

Two details need to be dealt with before the arithmetic logic unit function block is 
complete. Drivers are needed for the P, K, and R terms which will generate signals with the 
appropriate timing. In addition, inverters must be interposed in the carry chain occasionally 
to minimize the propagation delay through the entire carry chain. The way we have chosen 
to implement the interposition of inverters is to recognize that each carry chain function 
block contains two inverters which present at the output carry-in, having been twice 
inverted from the actual carry-in signal. If we merely substitute this signal for the carry¬ 
out signal from the pass transistor, we have doubly inverted in buffered our carry-in and 
buffered it to minimize the propagation delay. This approach avoids putting spaces between 
the carry function blocks for inverters. It is illustrated by the dotted connection lines in 
Fig. 2. In the actual implementation, the connection through the inverters was made in 
every fourth stage. 

Drivers for the P, K, and R terms have the following function: At some time during the 
null period of the ALU (which we shall call <pp, an OP code specifying each of the terms 
arrives at the input to the driver. It must be latched while the ALU itself is being 
precharged, and then it must be applied to the P, K, and R terms as soon as the ALU is 
activated. The P, K, and R function blocks are themselves composed of pass transistors, and 
their outputs are more effectively driven low than high. For this reason, we will precharge 
the outputs of the P, K, and R function blocks as well as the carry chain itself. This is most 
conveniently done by requiring that all of the P, K, and R control signals be high during the 
null period of the ALU. Then, independent of the states of A and B inputs, the outputs will 
be charged high by the time ALU active period commences. The control buffer which 
implements this function is shown in Fig. 7. 


The OP code is latched through a pass transistor whose gate is connected to <p^, and the OP 
code runs into a NOR gate, the other input of which is also <pj. Thus, the output of the 
NOR gate is guaranteed to be low during the period. The NOR gate output is then run 
through an inverting super-buffer, so that during <p± the output is guaranteed to be high. 
At the end of <p lt whatever OP code is present at the input of the NOR gate is transferred 
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to the particular P, K. or R term being driven. The only interface specification for the 
ALU which must be passed to the next level of system design is that the P, K, and R terms 
be valid before the end of <pj, and that the A and B inputs likewise be valid by the end of 
qpj and be stable throughout <p 2 - the active period of the ALU. We are then guaranteed that 
after enough time has passed to allow the carry to propagate, the output of the R function 
block will accurately reflect the specified function of the ALU and may be latched at the 
end of <p 2 - 


ALL Registers 

In order for the arithmetic logic unit described in the last section to be useful, it must be 
equipped with a set of registers both for its input variables and for its output. Let us 
consider the input registers first. Inputs to the ALU may be derived from either the shifter, 
the buses, or other sources. They may be latched and left unchanged during any machine 
cycle or set of machine cycles. This is one of the situations in which combining the 
multiplexing function with the latching function simplifies the design and achieves better 
performance. A register operating in this manner is shown in Fig. 8. 

The input to the first inverter can be derived front four sources: three internal sources such 
as shifter output, bus, etc., and a fourth, the output of the second inverter. When it is 

desired to latch a new signal into the register, one of the source pass transistors is driven 

high during qpj. The feedback transistor around the two inverters is always activated during 
<P 2 - Thus, with three vertical control wires plus the <p 2 timing signal, it is possible to select 
one of three sources into the register, or none of the three sources, thereby leaving the 
previous value of the register stored on the gate of the first inverter during the <p^ period. 
Since it is necessary to have two inverters to form the stable pair when the feedback 
transistor is on, both the input and its complement are available as required by the P and K 
function blocks of the arithmetic logic unit. The OP code signal which selects which source 
will be applied to the ALU input register during must come in during the previous <p 2 . 
Each of the select signals must be low during <p 2 , and at most one of them may come high 
during the following <pi. A driver appropriate for these control signals is shown in Fig. 9. 
The control OP code is latched during <p 2 , during which time the NOR gate shown disables 
the output driver. Since the output driver in this case is non-inverting, the output select 
line is held low during all of <p 2 - At the end of <p 2 - the OP code signal is latched and the 

particular select line to be enabled that cycle is allowed to go high. 
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Note that this timing allows two incoming OP code bits per external wire per machine cycle. 
In particular, if it were desirable to share a microcode bit between the ALU function and 
the ALU selector inputs, this could be done by bringing the ALU OP code in during <pj and 
the ALU input selection code in during <p 2 , as shown in Fig. 10. This technique was 
suggested by Ivan Sutherland. 

The ALU output register is similar to the ALU input register, except the timing is reversed. 
The result of the ALU operation is available at the end of <p 2 . 

An OP code bit will, if desired, enable the latch signal to go high during <p 2 . The feedback 
transistor is always enabled during qpj, and thus the latch is effectively static even though in 
the absence of a latching signal the data is stored dynamically on the gate of the first 
inverter through the <p 2 period. Once again, both the output and its complement are 
available if desired. 


Buses 

An early design decision was that data would flow through the machine on two buses which 
communicate with all of the major blocks of the system. We have already seen that the 
ALU performs its operation during the <p 2 period and does not have valid data to place into 
its output register until the end of <p 2 . If data are to be transferred from the output register 
of the ALU to its input register, this must be done during the ^ period. If we adopt a 
standard timing scheme in which all transfers on the buses occur during <p lf we can make 
use of the <p 2 period when the ALU is performing its operation to precharge the buses in 
the same manner that the carry chain was precharged during the period. In this way we 
solve one of the knotty problems associated with a technology designed for ratio logic. If 
we had insisted that the tristate drivers associated with various sources of data for a bus be 
able to drive up as well as down, we would have required both a sourcing and sinking 
transistor, together with a method for disabling both transistors. While it is perfectly 
possible to build such a driver (we shall undertake the exercise as part of the design of the 
output ports), it is a space-consuming matter to use such a driver at every point where we 
wish to source data onto an internal bus. By using the bus precharge scheme, our tristate 
drivers become simply two series transistors as shown in Fig. 11. 
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Here the data from one source, for example the ALU output register, is placed on the gate 
of one of the series transistors. An enable signal which may come high during cp^ is placed 
on the other series transistor. If one and only one of the enable signals is allowed to come 
high during any one <pj period, the bus can be driven from as many sources as necessary. 
The performance of such a bus is limited only by the pull-down capability of the two series 
transistors. We shall adopt this philosophy for the processor chip we are designing, and 
attach such a tristate driver to each of the output registers for the ALU. 


Barrel Shifter 

Since shifting is basically a simple multiplexing function, it might be thought that a shifter 
could be combined with the input multiplexer to the ALU. A simple 1-bit, right-left 
shifter implemented in this manner is shown in Fig. 12. 

It is identical with the three-input ALU register, and the three inputs have been used to 
select between the bus, the bus shifted left by one, and the bus shifted right by one. To 
support the multibit shifts necessary for field extraction and building up odd bit arrays, 
something more is required. One is tempted initially to build up a multibit shift out of a 
number of single shifts. However, for word lengths of practical interest, the n z delay 
problem mentioned in Chapter 1 makes such an approach unworkable. 

The basic topology of a multibit shift dictates that any bus bit be available at any output 
position. Therefore, data paths must run vertically at right angles to the normal bus data 
flow. Once this simple fact is squarely faced, a multibit shifter is seen as no more difficult 
than a single bit shifter. A fundamental circuit which allows any bit to be connected to any 
output position is shown in Fig. 13a. It is basically a crossbar switch with individual MOS 
transistors acting as the crossbar points. In principle this structure can be set to interchange 
bits as well as shift them, and is completely general in the way in which it can scramble 
output bits from any input position. In order to maintain this complete generality, the 
control of the crossbar switch requires n 2 control bits. In some applications, this n 2 bits 
may not be excessive, but for most applications a simple shift would be adequate. The gate 
connections necessary to perform a simple barrel shift are shown in Fig. 13b. The shift 
constant labelled SC, is presented on n wires, one and only one of which is high during the 
period the shift is occurring. If the shift outputs, SH0,l,2,etc., are precharged in the same 
manner as the bus, the pass transistors forming the shift array are only required to pull 
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down the shift outputs when the appropriate bus is pulled to low by its tristate drivers. 
Thus, the delay through the entire shift network is minimized and effective use is made of 
the technology. 

A second topological observation is that in every computing machine, it is necessary to 
introduce literals from the control path into the data path. However, our data path has been 
designed in such a way that the data bits flow horizontally while the control bits from the 
program store flow vertically. In order to introduce literals, some connection between the 
horizontal and vertical flow must occur. It is immediately obvious in Fig. 13 that the bus is 
available running vertically through the shift array. It is then the obvious place to 
introduce literals into the data path or to return values from the data path to the controller. 

At the next higher level of system architecture, the shift array bit slice may be viewed as a 
system element with horizontal paths consisting of the bus, the shifter output, and if 
necessary, the shift constant since it appears at both edges of the array, as well. The literal 
port is available into or out of the top edge of the bit slice, and the shift constant is 
available at the bottom of the bit slice. These slices, of course, are stacked to form the 
entire shift array as wide as the word of the machine being built. 

One more observation concerning the multibit shifter is in order. We stated earlier that our 
machine was to be a 2-bus machine. Therefore, any bit slice of a shifter such as the one 
shown in Fig. 13 will of necessity have two buses running through it rather than one. We 
chose to show only one for the sake of simplicity. There remains the question of how the 
two buses are to be integrated with the shifter. Since we are constructing a two-bus 
machine, we have two full words available, and a good field extraction shifter would allow 
us to extract a word which gracefully crosses the boundary between tw'o machine words. 
The arrrangement shown in Fig. 13 performs a barrel shift on the word formed by one bus. 
For the same number of control lines and pass transistors, only having added the bus lines 
which are required for the balance of the machine anyway, we may construct a shifter which 
places the words formed by the two buses end to end and extracts a full-width word which 
is continuous across the word boundary between the A and B buses. This function is 
accomplished in as compact a form as just described with a circuit shown in Fig. 14. Notice 
that the vertical wires have a split in them. The portion of the wire above the 
corresponding shift output being connected to the A bus, and that below the corresponding 
shift output to the B bus. 




Figure 13b. 4-By-4 Barrel Shifter. 
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It can be seen by inspection that this circuit performs the function shown in Fig. 15 which 
is just what is required for doing field extractions and variable word length manipulations. 
The literal port is connected directly to the A bus and may be run backwards in order to 
discharge the bus when a literal is brought in from the control port. A block diagram which 
represents the shifter at the next level of abstraction is shown in Fig. 16. 

In order to complete the shifter functional block, it is necessary to define the drivers on the 
top and bottom which interface with the system at the next higher level. Let us assume that 
the literal bus from outside the chip will contain data which are valid on the opposite phase 
of the clock from that of the internal buses. In that case, a very simple interface between 
the two buses which will operate in either direction is shown in Fig. 17. 

The internal shifter output is precharged during <p 2 , and active during <pj. It may be 
sourced either from the literal bus or from the shifted combination of the A and B buses 
through the shift array, shown in Fig. 15. The external literal bus itself may be sourced 
either from the opposite end (the external paths from the program source) or from the end 
attached to the A-Bus in the shift array shown. 

The bus to the external literal path is precharged during <p^, and data from the literal port 
of the shifter are enabled onto it by a signal active during <p 2 , as shown in Fig. 17. The two 
signals, * IN, and y 2 * OUT, are derived from buffers identical to those shown earlier. 
The shift constant itself is represented by one line out of n, which is high, the others 
remaining low. Buffers for these lines are identical to those shown in Fig. 9. 

There is one more observation concerning the n-bit shift constant. It is represented most 
compactly by a log n bit binary number. However, in order to generate from such a form a 
signal that can be used in the actual data path, a decoder is required to convert the binary 
number into a one-of-n signal suitable for feeding the buffers. Decoders can be made in a 
number of ways in the ratio technology we are discussing. The most common form is the 
NOR form, which is merely the fully decoded equivalent of the AND-plane in the 
programmable logic array. Chapter 3. It is shown in Fig. 18. Notice that the output is a 
high-going one-of-n pattern. 

Decoders can also be made in other forms. For small values of n, the NAND form shown 
in Fig. 19 is often convenient. We used a variant of this form for the ALU function block 
described earlier. Notice that the output of this form, when used as a decoder, is a lowgoing 
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one-of-n pattern. There is also a complementary form of decoder which can be built with 
ratio technology, and was suggested by Ivan Sutherland. It takes advantage of the fact that 
in any decoder both the input term and its complement must be present. In this case, the 
input term can be used to activate pull-up transistors in series, while the complement can be 
used to activate pull-down transistors in parallel. This logic form is similar in principle to 
that used with fully complementary technologies, and has similar benefits. It can generate 
either a highgoing or a lowgoing one-of-n number, and dissipates no static power. A 
decoder of this sort is shown in Fig. 20. Once w'e have added the appropriate buffers and 
decoders to our shift array, we have a fully synchronized function block ready to be 
integrated with the system at the next level up. The properties of this block are shown in 
Fig. 21. 


Register Array 

In any microcoded machine designed for emulating an instruction set at a higher level, it is 
convenient to have a number of miscellaneous registers available, both for working storage 
during computations and for storing pointers of specific significance in the machine being 
emulated: stack pointers, base registers, program counters, etc. Since the machine is a two- 
bus machine and the ALU is a two-operand device, it is convenient if the registers in our 
machine are two-port registers. Using the design philosophy we have been discussing, a 
typical two-port register cell is shown in Fig. 22. This register is a simple combination of 
the input multiplexer described earlier, the yj fedback transistor, and two tristate output 
drivers, one for each bus. The registers can be combined into an array m bits long and n 
bits wide, the buses passing through the array. Hach cell of the array is defined at the next 
level up, as showm in Fig. 23. Drivers for the load inputs and the read outputs are identical 
to those shown in Fig. 9. While we could immediately encode the load and read inputs to 
the registers into log n bits, we shall delay doing so until the next level of system design. 
There are a number of sources for the A bus besides the registers, and we will conserve 
microcode bits by encoding them together. 

Before we proceed, there is one mundane detail which must be taken care of in the overall 
topological strategy. The routing of VDD and Ground must generally be done in metal, 
except for the very last runs within the cells themselves. Often the metal must be quite 
wide, since metal migration tends to shorten the life of conductors if they operate at current 
densities much in excess of 1 ma per square micron cross-section. Thus, it is important to 
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have a strategy for routing ground and VDD to all the cells in the chip before doing the 
detailed layout of any of the major functional blocks. Otherwise, one is apt to be faced 
with topological impossibilities because certain conductors placed for other reasons interfere 
with the routing of the VDD and ground. A possible strategy for the overall chip layout 
shown in Fig. 1 is shown in Fig. 24. 

Notice that the VDD and ground paths form a set of interdigitated combs, so that both 
conductors can be run to any cell in the chip. Any strategy will do, but it must be 
consistent, thoroughly thought through at the beginning, and rigidly adhered to during the 
execution of the project. 


Communication with the Outside World 

Although in particular applications the interface from a port of the machine to the outside 
world may be a point to point communication, the ports will often connect to a bus. Thus 
it is desirable to use port drivers which may be set in a high impedance state. Drivers which 
can either drive the output high, drive the output low, or appear as a high impedance to the 
output are known as tristate drivers. Such drivers allow as many potential senders on the 
bus as necessary. Figure 25 shows the circuit for a tristate interface to a contact pad. 

Here, either bus A or bus B can be latched into the input of a tristate driver during 
Likewise the pad may be latched into an incoming register at any time independent of the 
clocking of the chip. Standard tristate drivers are enabled on bus A and B. The only 
remaining chore is the design of the tristated buffer which drives the pad directly. Details 
of the tristate driver are shown in Fig. 26. 

The terms out and outbar are fed to a series of buffer stages which provide both true and 
complement signals as their outputs, and are disabled by a DISABLE signal. Note that this 
DISABLE signal does not cause all current to cease flowing in the drivers, since the pull-up 
transistors are depletion type, but reduces the current to a value where it can be handled by 
the disable transistor of the following buffer stage. In general there will be a number of 
super buffer stages of this sort. The very last stage of the driver is shown in Fig. 26b. It is 
not a super buffer but employs enhancement mode transistors for both pull-up and pull¬ 
down. These transistors are very large in order to drive the large external capacitance 
associated with the wiring attached to the pad. They are disabled in the same manner as the 
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super buffers, except that when the gates of both transistors are low, the output pad is truly 
tristated. Once again the two output transistors are a factor of approximately e larger than 
the last super buffer in the buffer string. 

As we have seen, the inverter string necessary to transform the impedance from that of the 
internal circuits on chip to that sufficient for driving a pad attached to wiring in the 
outside world is quite large, and imposes a delay of some factor times a logarithm of this 
impedance ratio upon communications between the chip and the outside world. Any help 
which can be obtained in making this transformation is of great value. For example, the 
latch and buffers associated with the input bus circuit to the pad drivers can themselves be 
graded in impedance level, so that by the time the out and outbar signals are derived, they 
are at a considerably higher current drive capability than the buses. Note that the buses are 
a considerably larger capacitance than minimum nodes on the chip, and thus the initial 
latch buffers can be larger than typical inverters on the chip. All such tricks help to 
minimize the number of stages between the bus and the outside pad, and thus the total delay 
in going off chip. 


Machine Operation Rncoding 

By now we have defined a complete functional data path with ports on each end and 
functional blocks through the center, as shown in Fig. 27. The op code bits required to 
control the data path and the phase of the clock on which they are latched are shown. 
There are forty-nine such bits together with the four asynchronous bits for latching and 
driving the pad to the external world. In addition, there are the carry-out wire and the 
sixteen literal wires. These sixty-six wires together with the thirty-two front the left and 
right port must go to and come from somewhere. Schemes for encoding internal machine 
operations into OP codes of various lengths are well known, and will not be discussed here. 
At one extreme all the OP code wires can be brought out to a microcode memory driven by 
a micro program counter and controller, in which case all operations which can be done by 
the machine may be done in parallel. The opposite extreme is to very tightly encode the 
operations of the machine into a predefined OP code set. In the present machine, this 
encoding would be most conveniently done by placing a programmable logic array or set of 
programmable logic arrays along the top and the bottom of the machine data path. A 
condensed OP code could then be fed to the programmable logic arrays which in turn 
sequence the data path through the microinstructions required for executing machine code. 



[Ch.5., Sect.1 ] 


Drv A (PHI 2) 
Lch A (PHI 2) 


Drv A (PHI 2) 
Lch A (PHI 2) 


Load Pad 
I 3 


Left 

Port 


f. 


Wrt A Rd A 

(PHI 2) (PHI 2) 

4' 4* 

Literal 

(PHI 2) 

I’ 6 

A Sel 

(PHI 2) 

4L 

P K 

(PHI 1) 

4? 4‘ 

Cout 

(PHI 2) 

t’ 

R 

(PHI 1) 

4? 

Lch 

Rd 

A (PHI 1 

A (PHI 2) 

P 

16 


Shift 


ALU 


Kill 

Carry 

Result 


ALU 

Registers 


Array 


Input 


and 

Chain 

Control 


Out 





Reg. 


Prop. 




Reg 







Control 






'f' 

•1 X 

4' 

4" 

4- 

'b 

1 4 

U 

I4 

12 

li 

» 3 

Wrt B 

Rd B 

Shift 

B Sel 

Cin 

Lch B (PHI 1) 

(PHI 2) 

(PHI 2) 

Constant 

(PHI 2) 

(PHI 1) 

Rd B (PHI 2) 


(PHI 2) 


Drv B (PHI 2) 
Lch B (PHI 2) 


Drv Pad 


Load Pad 


£ 


Right 

Port 


t 


Drv B (PHI 2) 
Lch B (PHI 2) 
Drv Pad 


Figure 27. Block Diagram of Datapath with Control Wires Added. 


(ch5fig27.sil) 













[Ch.5: Design of OM: Sect.l]<Cunway>doml.vlsi February 4, 1978 5:51 PM 


The important point of the design strategy we have chosen is that we can orthogonalize the 
design of the data path and the design of the OP code set in such a way that the interface 
between the two designs is very well defined, very dean, and can be described precisely, in a 
way that system designers at the next higher level can understand and work with 
comfortably. The data path can then be viewed as a component in the next level system 
design. As time progresses and it is possible to construct chips with larger and larger 
functional density, blocks of the sprt shown will form cornponents in even larger 
geometrical arrangements which will form even larger components and a whole hierarchy 
will emerge which will implement a system function at a much higher level than 
contemplated here. However, if the design strategy we have described is followed, it is 
possible to construct arbitrarily large and complex systems which are guaranteed to work if 
the individual component blocks are correct, and given the clocking period is sufficient to 
allow the slowest functional unit to perform its function. 

Using the approximate capacitance values given at the end of Chapter 2, an estimate of the 
minimum clock period for the machine can be made. The Phase 1 time of the machine is 
~50t, the same as the general estimate given in the section "Transit Times and Clock 
Periods" in chapter 1. However, the Phase 2 time of the machine is limited by the carry 
chain, as discussed earlier in this chapter. The relative areas of metal, diffusion, and gate 
can be estimated from the ALU layout shown in Figure 6a. The metal and diffusion occupy 
~15 and -8 times the area of the propagate pass transistor gate, respectively. Metal is -0.1 
and diffusion is typically 0.2 times the gate capacitance per unit area. Thus the total 
capacitance of each stage of the carry chain is -4.5 times that of the pass transistor gate. 
The effective delay time is correspondingly longer than the transit time r of the transistor 
itself. The effective delay through n stages of such pass transistor logic is - m 2 . In the 
OM2, n=4 and the effective delay for 4 bits of carry chain is -4.5*16r * 72t. To this must 
be added the delay of the doubly inverting buffers at the end of every 4 bits of straight 
Manchester logic. This delay is (1+k) times the transit time of the inverter pulldown, 
properly corrected for stray capacitance in the inverter. Here the inverter ratio k is - 8, 
since its input is driven through the pass transistors. Conservatively, strays in such a circuit 
are always several times greater than the basic gate capacitance, and we may estimate the 
inverter delays at -30r. The total carry time is thus -100 times the transit time for each 
block of 4 ALU stages. The total Phase 2 time should then be -400t. In 1978, t - 0.3 ns, 
and we would expect a minimum total clock period of -450t, or -135 ns. 
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The Second Half of this Chapter contains a functional specification of the OM2 
machine, by Dave Johannsen of Caltech. This specification was originally documented 
in Display File #111l, by Dave Johannsen and Carver Mead of the Caltech Computer 
Science Department, and copyrighted by Caltech. The specification is reprinted here 
with the permission of the California Institute of Technology. 



Functional Specification of the OM2 Machine 

[Seaton contributed b\ David I . Johannsen, Caltech] 


Introduction 


This specification describes a 16-bit data engine referred to as OM2 [#986]. The OM2 
contains 16 registers, an ALU, and a 32-bit shifter, and is designed as part of a micro¬ 
programmed writeable-control-store machine. The companion chip is the Controller chip, 
which contains the program counter, stacks, and so on. The Controller is described in 
Chapter 6. The entire system is designed to run on a single 5 volt supply. 

The OM2 Datachip has two data ports for communication with the external system and a 
communication path to the Controller chip. The data ports are tri-state with either internal 
or external control. Communication with the Controller consists of a 16-bit literal port and 
a single flag bit. Seven control bits come directly from the microcode memory. 

The system runs on a single clock. When the clock is high, the internal buses transfer 
data: when the clock is low, the ALU is performing its operation. Microcode bits enter the 
Datachip the phase before that code is to be executed. Therefore, the bus transfer code 
enters the Datachip when the clock is low, and the ALU code enters when the clock is 
high. Figure 1 shows a possible OM system. 

Throughout this section a positive logic convention is used. A "1" refers to a high voltage 
level, while a "0" refers to a low voltage level. 

Datapaths 

A block diagram of OM2 is shown in figure 2. There are two buses which connect the 
various elements of the chip together. These buses transfer data while the clock is high, 
the period referred to as <p1. During <p2, when the clock is low, the buses are precharged. 
Each bus can only get data from one source, and give data to one destination during any 
one cycle. 
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The Left and Right Ports communicate between the datachip and the outside world. The 
Right Port has been traditionally known as the memory bus port while the Left Port has 
been the system bus port, but since the two ports are identical, this is an arbitrary 
convention. Each port has both an input latch and an output latch to provide facilities for 
synchronizing the datachip to the outside buses. Under program control either of the two 
buses can load the output latch during qp 1. There are three modes of driving data from the 
output latch to the pins, two of which are under program control and one of which is under 
hardware control. The first method is to output the data as soon as it comes from the bus, 
during the same <p1. The second method is to latch the data from the bus during <p 1 and 
drive it out during the following <p2. The final method is to latch the data from the bus 
during <p1, but output the data when an enable pin is pulled low. The enable pin would be 
controlled by a bus manager, and can be asynchronous with respect to the datachip. 
Inputting from the port is similar. By pulling down on another enable pin, data from the 
external bus is loaded into the input latch, which can be read later under program control. 
Alternatively, the microcode can force the data currently on the external bus into the 
internal bus during the current <p 1. With this scheme, many types of synchronous and 
asyncronous buses may be interfaced to OM2s. For internal control only, the external 
enable pins can be left floating. 

Registers 

The registers are static and dual port. Any one of the 16 registers may source either or 
both of the buses, while any one of the 16 may be the destination for either bus, but not 
both. There are only two restrictions to the use of the registers: 

1. One register may not be the destination for both buses on the 
same cycle, and 

2. One register may not be both the source for one bus and 
the destination for the other bus on the same cycle. 


Shifter 

The shifter concatenates the two buses, resulting in a 32-bit word, with the A bus being 
the more significant half. The shift constant then selects the bit position where the 16-bit 
output window starts. The shift constant specifies the number of bits from the B bus 
present in the output (ie. a shift constant of 0 returns the A bus, while a shift constant of 
15 returns the LSB of the A bus in the MSB of the output, followed by all but the LSB of 
the B bus in the rest of the word). A conceptual picture of the shifter is shown in figure 3. 
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The ALU can select as inputs either the bus, the shift output, or shift control. If shift 
control is selected, the entire word is 0 except where the LSB of the A bus appears in the 
shift output. The shifter operates on qpl; it may be viewed as an extension of the buses. 


ALL 


A block diagram of a single bit of the ALU is shown in figure 4. The ALU operates on the 
data which is contained in its two input latches. Input latch A may be loaded from the A 
bus, the shifter output, or the shift control, while the input latch B may be loaded from the 
B bus, the shifter output, or the shift control. 

The outputs of the two latches become the inputs to two function blocks which determine 
what will happen on the carry chain. Function block P determines whether the carry chain 
propagates, while K decides if it is to kill the carry. If neither are true, the carry chain 
generates a carry. Each function block has four control inputs, which, for the Propagate 
function block, are referred to as PFF, PFT, PTF, and PTT. If PFF is enabled, the P block 
output is high if both input latches are false (contain 0). Enabling PFT activates the output 
if input A is false and input B is true, and so on. If, for example, both PFF and PFT are 
enabled, the output is active if input A is false, regardless of the state of input B. To 
further illustrate the operation of the function blocks, consider addition. If both inputs 
contain a 1. the carry is to be generated, while if both inputs are 0, the carry is killed. If 
the two inputs are different, the carry is to be propagated (carry out«-carry in). To do this 
operation, the kill output should be active if both inputs are false, so KFF is enabled. Both 
PFT and PTF should be enabled to propagate properly. Therefore, K=(KFF, KFT, KTF, 
KH>(1,0,0,0), and P=(PFF, PFT, PTF, PTT)=(0,1,1,0). 

The result of the ALU is produced by the R function block, which has as inputs P block out 
and Carry in. For the addition example above, the output should be the exclusive-or of P 
and Cin, so R=(0,1,1,0). P, K. and R values for common ALU operations are listed in the 
programming section. 

Two ALU output latches (A and B) can be loaded from the R block output; either one may 
later be used to source either bus. 

Flags 

The carry input to the LSB of the ALU is a logical combination of a flag bit and two control 
inputs. The two control inputs can force the carry in to be either 1 or 0. or they can 
select either flag or flag bar as the input. 
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There is also a method for doing conditional ALU operations under the control of a two-bit 
conditional OP field. A conditional operation performed by the ALU is not only a function 
of the control inputs, but also of the flag bit. The conditional operation control forces 
some of the control inputs low, regardless of what the P. K, and R microcode says. The 
coding for conditional operations allows the use of operations like multiply step and divide 
step without the necessity for branching in the microcode. 

There is a 16-bit flag register which can also be a source or destination of the A bus. 
This register can also be loaded with the ALU flags during qp2. The ALU flags include carry 
out, overflow, carry in to the MSB, zero, MSB, LSB, Less than. Less than or equal to, and 
Higher (in unsigned value). The last three flags are comparison flags used after a 
subtaction. For example, after subtracting ALU input latch B from ALU input latch A, the 
"less than" flag is true if the value contained in ALU input latch B was larger than the value 
in ALU input latch A. 

The MSB of the flag register is called the flag bit, and this bit may be modified every <p 1 
by loading it with the value of one of the other bits of the flag register. The flag bit is 
used in the calculation of carry in and modification of conditional ALU Ops. This bit is also 
sent to the controller chip to be used for conditional branching, etc. 

Literal 

The one remaining datapath is the literal port. It is used to send data from the datachip to 
the controller, and vice versa. It is a source or destination for the A bus. When the literal 
port is being used, standard bus operations are suspended for that cycle. 

Programming 

The Datachip requires 23 bits of microcode on each phase of the clock. This section of 
the memo specifies the encoding of the fields within that microcode. Figure 5 shows the 
arrangement of the microcode word. 


Bus Transfer 


The bus transfer control bits enter the datachip during tp 2 and are used during the 
following <p1. There are two buses, the A bus and the B bus, which interconnect the 
modules of the Datachip. These two buses are similar in many respects; however, there 
are a few asymmetries as to sources and destinations. Also, when a literal is being 
transferred, the only bus transfer field which is active is the A bus destination, which 
stores the literal entered on the A bus. 
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A listing of the bus sources and destinations follows: 



A Bus Source 


A Bus Destination 

Onnnn 

Register n 

Onnnn 

Register n 

10000 

Right port pins 

10000 

Left port, drive now 

10001 

Right port latch 

10001 

Left port, drive qp2 

10010 

Left port pins 

lOOIx 

Left port, no drive 

1001 1 

Left port latch 

10100 

Right port, drive now 

10100 

ALU output latch A 

10101 

Right port, drive <p2 

10101 

ALU output latch B 

101 lx 

Right port, no drive 

10110 

Flag register 

11000 

ALU input latch A 



11001 

ALU input latch A gets shift out 



11010 

ALU input latch A gets shift 




control 



11011 

Flag Register 



B Bus Source 


B Bus Destination 

Onnnn 

Register n 

OOnnnn 

Register n 

10000 

Right port pins 

010000 

Left port, drive now 

10001 

Right port latch 

010001 

Left port, drive 2 

10010 

Left port pins 

OlOOIx 

Left port, no drive 

10011 

Left port latch 

010100 

Right port, drive now 

10100 

ALU output latch A 

010101 

Right port, drive <p2 

10101 

ALU output latch B 

0101 lx 

Right port, no drive 



0110xx 

ALU input latch B 



1Onnnn 

ALU input latch B gets shift 




output, shift const. =n 



11nnnn 

ALU input latch B gets shift 


control, shift const.sn 

Al.l Input Selection 


The two ALU input latches are destinations for the two buses, as shown in the Bus 
Transfer section above. In addition to being loaded directly from the buses, these two 
latches can be loaded from the outputs of the shift array. The shift constant always comes 
from the 4 least significant bits of the B Bus Destination field, even though the destination 
of the B Bus is not the ALU input latch B. For example, the B Bus may be transferring the 
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contents of register 3 into register 5 while the A Bus is transferring the contents of 
register 4 to the ALU input latch A through the shifter. In this case, the shift constant 
would be ‘'5”, because the 4 least significant bits of the B Bus Destination field contain 
" 0101 ". 


All Operations 


The following table shows coding for ALU operations that are commonly found useful. The 
user is encouraged to encode other operations if these are not suitable.The numbers given 
are the decimal representation of the 4 bit control word. For P and K, 
A’B' = 1 ,A'B=2,AB' = 4,AB=8. For R, P'C'= 1 ,P'C = 2,PC' = 4,PC=8. Cin is the cary in select, and 
Cond is the conditional OP select. 


A + B 

K 

1 

P 

6 

R 

6 

Cin 

0 

Cond 

0 

Add 

A + B-t-Cin 

1 

6 

6 

1 

0 

Add with carry 

A-B 

2 

9 

6 

2 

0 

Subtract 

B-A 

4 

9 

6 

2 

0 

Subtract reverse 

A-B-Cin 

2 

9 

6 

1 

0 

Subtract with borrow 

B-A-Cin 

4 

9 

6 

1 

o' 

Subtract rev. w/borrow 

-A 

12 

3 

6 

2 

0 

Negative A 

-B 

10 

5 

6 

2 

0 

Negative B 

A+1 

3 

12 

6 

2 

0 

Increment A 

B+1 

5 

10 

6 

2 

0 

Increment B 

A-1 

12 

3 

9 

2 

0 

Decrement A 

B-1 

10 

5 

9 

2 

0 

Decrement B 

AAB 

0 

8 

12 

0 

0 

Logical And 

AVB 

0 

14 

12 

0 

0 

Logical Or 

A©B 

0 

6 

12 

0 

0 

Logical Exor 

—<A 

0 

3 

12 

0 

0 

Not A 


0 

5 

12 

0 

0 

Not B 

A 

0 

12 

12 

0 

0 

A 

B 

0 

10 

12 

0 

0 

B 

Mu! 

1 

14 

14 

0 

1 

Multiply step 

Div 

3 

15 

15 

0 

2 

Divide step 

A/O 

0 

14 

12 

0 

3 

Conditional And/Or 

Mask 

10 

5 

8 

2 

0 

Generate mask 
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7 


( urry In Select 

The Carry in select field determines what the carry into the LSB of the ALU will be, 


according to 

the following table: 

00 

0 

01 

Flag bit 

10 

1 

11 

Flag bit complemented 


Conditional Op Select 

The conditional op select field is used to generate 3 basic conditional type operations: 
Multiply, Divide, and And/Or step. In a great many cases, the conditional op allows 
functions dependant on a flag to be performed in one cycle, rather than sending the flag to 
the controller and branching to two separate instructions depending upon that flag. When 
a conditional OP is selected, certain ALU control bits are forced to zero. Which bits are 
zeroed depends on the conditional OP selectand the flag bit, as follows: 


Select 

Flag bit 

K 

p 

R 


0 

X 

— 

— 

— 

Unconditional 

1 

0 

—-o 

—0- 

—0- 

Multiply step 


1 

— 

0— 

0—- 


2 

0 

0--0 

-00- 

-00- 

Divide step 


1 

-oo- 

0--0 

o-o 


3 

0 

— 

— 


And/Or 


1 

— 

-00- 
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Hat;s 

The flag select field determines which of the ALU flags becomes the new flag bit. The 
following table lists the selection options. 


Select 

0 

1 

2 

3 

4 

5 

6 
7 


New Flag Bit 

Old flag bit 
Carry out 
MSB 
Zero 

Less than 

Less than or equal 
Higher (in absolute value) 
Overflow 


The ALU flags are loaded into the flag register under the control of the latching field, bit 3. 
They are loaded into the following positions: 


Bit 


Flag 

0 


Not changed 

1 


Not changed 

2 


Not changed 

3 


Not changed 

4 


Not changed 

5 


Previous value of Flag bit 

6 

—i 

Carry into MSB stage 

7 


Less than or equal 

8 

—i 

Higher (in absolute value) 

9 


Less than 

10 


LSB 

11 


Zero 

12 


MSB 

13 


Overflow 

14 


Carry out 

15 


Current Flag bit 
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laic hint; Field 

The latching field specifies which of four registers should be loaded, as shown in the 
following table: 


Latching Field 

1 XXX 
xl XX 
XX 1 X 
XXX 1 

0000 


Register Loaded 

Flag register loaded with current AL flags 

ALU output latch A loaded with the ALU output 

ALU output latch B loaded with the ALU output 

The Literal field during the next qp2 is loaded with 

the contents of the A Bus during the last <p2 
None of these registers are affected 


Literals 


The two bit literal field specifies when a literal is to be used and which direction it goes. 
If both bits are 0, no literal transaction will occur. If the first bit is 1, a literal will be 
transfered. if the second bit is 1 , the literal goes off chip, while if the bit is 0 , the literal 
comes on chip. 

Programming Examples 

This section of the memo contains 3 programming examples which should provide a better 
understanding of the various datapaths within OM 2 . 

The first example is 16-bit integer multiplication, The two inputs. X and Y, are multiplied to 
produce the result, Z. In the multiply loop, the number X is shifted left and the MSB is 
stripped off. Z is shifted left, then Y is added to the new Z if the MSB of X was a 1. The 
sequence of instructions is repeated 16 times, using the counter in the controller to signal 
when the 16 iterations have been performed. Figure 6 illustrates each step of the loop, 
which is listed here: 

<p2: ALU.Out. A*-ALU( Shi ft left)^ALU.in.A; 

Latch Flags; 

q- 1 : ALU.In.A<~Shift.out, Bus.A<-ALU.Out.B; 

BUS.B<~R[1 ]; ‘-This gives a shift constant of 1. 

<p2: ALU.Out.B<~ALU(Multiply Step); ‘-conditionally add. 

Flag *-C out; 

9 1: ALU.In.A<-Bus.A<-ALU.Out.A 



Figure 6a. Shift X in the ALU, putting the Cout flag into Flagbit. 


(Phi 2) 



Figure 6b. Put Z on Bus A, and shift 1 left in shifter. (Phi 1) 



Figure 6c. Conditionally add Z and Y. (Phi 2) 



Figure 6d. Bring X back around to the ALU input. (Phi 1) 


(figB.sil) 
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The second example will be to generate a parity flag, which is not directly available from 
tlie ALU. Parity is generated by exclusive-onng all of the bits of the data together. If the 
data are loaded into both ALU inputs, with the B input rotated by 1, performing an 
exclusive-or operation will give an output that is the exclusive-or of adjacent bits; bit i of 
the output will be bit i of the input © bit i-7 of the same input. If this same operation is 
performed, this time rotating the B input by 2, bit i becomes / © i-1 © i-2 © i-3. By doing 
this 2 more times, rotating B first by 4 and then by 8, every bit of the output is equal to 
the parity: the exor of all of the bits. The MSB flag is the Parity Odd flag, while the Zero 
flag is the Parity Even flag. The program is listed here, and illustrated in figure 7: 

cpI: ALU.In.A*-Bus.A*-R[0]; ‘-generate the parity of register 0. 

ALU.In.B*-Shift.out( 1); Bus.B*-R[0]; 
cp2: ALU.Out.A*-ALU(Exor); 

(pi: ALU.In.A*-Bus.A*-ALU.Out.A; 

ALU,ln.B+Shift.out(2); Bus.B*-ALU.Out.A; 

(p2: ALU.Out.A*-ALU(Exor); 

(pi: ALU.In.A*-Bus.A<-ALU.Out.A; 

ALU.In.B*-Shift.out(4); Bus.B*-ALU.Out.A; 

<p2: ALU.Out.A*-ALU(Exor); 

ipl: ALU.In.A*-Bus.A<-ALU.Out.A; 

ALU.In.B*-Shift.out(8); Bus.B*-ALU.Out.A; 

(p2: ALU(Exor); 

The third example adds all of the registers to what is in ALU.Out.A. By executing and 
modifying a literal, the registers can be indirectly accessed, which makes this routine 
possible. Figure 8 illustrates the operation of the following code: 

(pi: ALU.In.A*-Literal "Bus.A<-R[1 ]; ALU.In.B<-Bus.B<-ALU.Out.B"; 

<p2: ALU. Out.B+-ALU*-ALU.In. A; 

(pi: ALU.In.A<-Bus.A<-R[0]; 

(p2: ALU.Out.B*~ALU*-ALU.In.A; «-This is just setup, now the loop! 

(pi: Bus.A*-ALU.Out.B; 

ALU.In.B*-Bus.B*-ALU.Out.A; 
y2: ALU.Out.A*-ALU(add); 

Execute Literal; 

<p1 ; ALU.In.A*-A.Bus; «-The rest of this instruction is the literal! 

<p2: ALU.Out.B+'ALU(increment)*-ALU.In.B; 


point to next register. 






J 

Figure 76 . Shifting by 8. Result Has AH Bits Identically the Parity Flag. 


(ficj7 cd sil) 
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Figure 8a. Bring in Control Literal 



Figure 8b. Store in ALU.Out.B 



Figure 8c. Fetch Register 0 



Figure 8d. Clear Sum 

(fig8abcd sil) 
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Figure 8e. Bring Around Sum and Put Control Literal on Bus A 



Figure 8f. Add Current Numbers 



Figure 8g. Register Loaded by Literal Goes to ALU Input A 



Figure 8h. Point to Next Register, Loop to Figure 8e 


(figSefgh sil) 
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ISP Description of the OM2 Datachip^ 


Pin States 
Ip < 0:1 7> 
rp<0:17> 
new.code<0:22> 
flag.pin<0> 
power<0:3> 


left port 

right port 

microcode 

flag to controller 

power, ground, clock , substrate 


Pin Formats 

left.port.data<0:15> 
left.out.async<0> 
left.in.async<0> 
right, port. data<0:15> 
right.out.async<0> 
right.in.async<0> 

I iteraKO: 1 5> 
clock<0> 


= Ip < 0:15 > 

= lp< 16> 

= lp< 1 7> 

= rp < 0:1 5> 

= rp< 16> 

= rp< 1 7> 

= new.code<5:20> 
= power<3> 


Mp State 

reg[0:15]<0:15> 
a.bus<0:1 5> 

a. bus.old<0:15> 

b. bus<0:15> 
left.out<0:15> 
left, i n < 0:1 5> 
right.out<0:1 5> 
right.in<0:15> 
left. out. later<0> 
right.out.later<0> 
alu.in.a<0:15> 
alu.in.b<0:15> 
alu.out.a<0:1 5> 
alu.out.b<0:15> 
old.code<0:22> 
flags<0:15> 


registers 
bus a 

bus a latched for a literal 
bus b 

left pad output latch 

left pad input latch 

right pad output latch 

right pad input latch 

for output during <p2 operations 

for output during <p2 for right port 

alu input latch a 

alu input latch b 

alu output latch a 

alu output latch b 

microcode that came in last phase 

flag register 


Instruction format 

a. source<0:4> 

b. source<0:4> 

a. destination<0:4> 

b. destination<0:5> 
literal, in<0> 
old.iiteral <0:15> 
alu.p.op<0:3> 
alu.k.op<0:3> 
alu.r.op<0:3> 
alu.conditionai<0:1 > 
flag.select<0:2> 
carry.in.select<0:1 > 
latch.flags<0> 
latch, alu. out. a<0> 
latch.alu.out.b<0> 
literal. control<0> 
reg.select. 1 <0:3> 
reg.select.2<0:3> 


old.code<5:9> 

old.code< 16:20> 

old.code<0:4> 

old.code< 10:15 > 

old.code<22> 

old.code<5:20> 

old.code< 19:22> 

old.code< 15:18> 

old.code< 11:14 > 

old.code<9:10> 

new.code<6:8> 

old.code<4:5> 

old.code<3> 

old.code<2> 

old.code< 1 > 

old.code<0> 

a.source<0:3> 

a.destination<0:3> 
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reg.select.3<0:3> 
reg.select.4<0:3> 
select.1 <0> 
select.2<0> 
select.3<0> 
select. 4<0:1 > 
shift.constant<0:3> 
sharay<0:31 > 


Temporary State 
kill.control<0:3> 
propagate.control<0:3> 
result.control<0:3> 
kill < 0:1 5> 
propagate<0;15> 
carry<0:16> 
alu.out<0:1 5> 


Instruction Execution 
Inst ruct ion. execution: = ( 

left. out. async=0=>( left. port, data*- left, out) ;next 
left. in. async = 0=> (left, in Heft. port. data);next 
right.out.async=0=>( right, port.data*-right.out );next 
ri ght. i n. asy nc=0=>( right, in right, port, data); next 
phil (:=clock=t )=>( 
left.out.Iater*-0;next 
right.out.later*-0;next 
literal.in=1 => (a. bus*-old. literal );next 
literal. tn=0=>( 

select. 1 = 0=> (a. bus*-reg[ reg.select.1 ]); 
select.1 = 1 =>( 

reg.select. 1 =0=> (a.bus*-right.in aright.port.data); 
reg.select.1 = 1 =>(a.bus*-right.in); 
reg.select.1 =2=s> (a.bus*-left. inHeft.port.data); 
reg.select.1 =3=>(a.bus*-left.in); 
reg.select.1 =4 => (a. bus*-alu. out. a); 
reg. select. 1 =5=>(a.bus*-alu.out.b); 
reg.select. 1 =6=>(a.bus*-flags);next);next 
select. 3=0=>(b.bus*-reg[reg. select. 3]); 
select.3=1 =>( 

reg.select.3=0=>{b. bus aright, in*- right, port, data); 
reg.select. 3=1 =>(b.bus*-right.in); 
reg. select. 3=2 =>( b.bus*- left. inHeft. port, data); 
reg.select.3=3=>( b.bus*-left, in); 
reg.select.3=4=>(b.bus*-alu.out.a); 
reg.select.3=5=>(b.bus*-alu.out.b);next);next 
select.4=0=>(reg[reg.select.4]*-b.bus); 
select.4=1 =>{ 

reg. select. 4=0=> (left. port, data Heft, out *-b. bus); 
reg.select.4=1 =>( 
left.out*-b.bus;next 
left.out.lateral ;next); 
re g. select. 4=2 => (I eft.out ♦*b.bus); 
reg.select.4 = 3==>(lett.out*-b.bus); 
reg.select.4 = 4=> (right, port, data** right, out *-b. bus); 
reg.select.4=5=>( 

right. out*-b.bus;next 
right.out.lateral ;next); 


= b.source<0:3> 

= b.destination<0:3> 

= a.source<4> 

= a.destination<4> 

= b.source<4> 

= b.destination<4:5> 

= b.destination<0:3> 

= b.bus<0:1 5>Da.bus<0:1 5> 




fCli.5, Seel. 2; 0V12 Spec., In D, I.. Johunnsen ]<Conwa>>doni2.vlsi February 2, 1978 8:08 PM 


reg. select, 4=6 =>( right, out *-b. bus); 
reg.select.4 = 7=>(right.out<-b,bus); 
reg.select.4€{8,9,10,11 }=>{alu.in.b*-b.bus);next); 
seiect.4=2=>(alu.in.b<0:1 5>«* 

sharay< 16-shift. constant:31 -shift.constant>); 
select. 4=3=>(alu.in.b«-2T shift, constant );next);next 
select. 2=0=> (reg[reg. select. 2 ]*-a. bus); 
select.2=1 =>( 

reg.select.2 = 0=> (left. port, data 4 - left, out <-a. bus); 
reg. select. 2=1 =>( 

left.out<-a.bus;next 
left. out. lateral ;next); 
reg.select.2=2=>(left.out<-a.bus); 
reg.select.2 = 3=>( left, out <-a.bus); 
reg. select.2 = 4=> (right.port.data*-right.out<-a. bus); 
reg. select. 2 = 5=*( 

right. out*-a.bus;next 
right.out.lateral ;next); 
reg.select.2 = 6=>(right.out«-a.bus); 
reg.setect.2 = 7=>(right.out*-a.bus); 
reg. select. 2=8=> (alu. in.a*-a.bus); 
reg. select. 2=9 =>(alu.in.a<0:15>*- 

sharay< 16-shift, constant: 31 - shift.constant>); 
reg.se!ect.2 = l 0=>(alu. in. a*-2t shift, constant); 
reg.select.2=11 =>(flags*-a.bus);next);next 
flag. select= i =>(flags< 1 5>«-flags< 14>); 
flag.select=2=>(flags< 15>*-flags< 12>); 
f lag. se lect = 3=>(f lags < 15>«-flags< 11 >); 
f lag. select: 4 =s>(f!ags< 15>«-flags<9>); 
f lag. select=5 =>(f lags < 15> «-flags<?>); 
f lag. select: 6 => (f lags< 15>*-flags<8>); 
flag.select=7=>(flags< 15>«-flags< 13>);next 


phi2(:=clock=0)=>( 

left.out.later=1 => (left. port. dataHeft. out );next 
right.out.later=1 => (right.port.data*-right.out);next 
kill, control «-alu.k.op;next 
propagate, control <-alu.p.op;next 
result, control «-alu.r.op;next 
alu.conditional = 1 =>( 
flags< 15> = 1 =>( 

propagate. control<0>«-0; next 
result.control<0>«-0;next); 
flags< 15>=0=>( 

kill.control<3>*-0;next 
propagate.control<2>«-0;next 
result.control<2>*-0;next);next); 
al u. conditional: 2 =>( 
flags < 15 > = 1 =>( 

kill.control<2>*-0;next 
kill.controK 1 >*-0;next 
propagate, control <3>«-0; next 
propagate.contro!<0><-0;next 
result.control<3>*-0;next 
result.control<0>*-0;next); 
flags< 15>=0=>( 

kill.contro!<3>«-0;next 
kil (.control <0>*-0;next 
propagate. controK 2 >«-0;next 
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propagate. controK 1 > «-0;next 
result.controK2><-0;next 
result.controK 1 ><-0;next);next); 
alu. conditio nal = 3=> ( 
flags< 15> = 1 =>( 

propagate.control<2>«-0;next 
propagate.controK 1 >«-0;next);next);next 
kilKO: 1 5 > *- ( 

ki II. controK 3> A (~’alu.in.a< 0:15>) A(~’alu.in. b<0:15>)V 
kill.controK 2 > A(“ialu.in.a<0:1 5>) Aalu.in.b<0:1 5> V 
kill. controK 1 > Aalu.in.a<0:15> A(~«alu.in.b<0:15>)V 
kill.control<0> Aalu.in.a<0:15> Aalu.in.b<0:1 5>);next 
propagate < 0:15>«-( 

propagate.control<3> A(~’alu.in.a<0:1 5>) A(~>alu.in.b<0:15>) V 
propagate.control<2> A(~’alu.in.a<0:1 5>)Aalu.in.b<0:15>V 
propagate.controK 1 > A alu. in.a <0:1 5> A( 'alu.in.b<0:1 5>)V 
propagate. control<0> Aalu. in. a<0:1 5> Aalu.in.b<0:1 5>);next 
carry <0>*-carry.in.select<1 > ©(carry, in.select<0> A flags < 15>);next 
for k=1 step 1 until 16 do: 

(carry < k >♦-—'(kilKk-1 > + propagate<k-1 >* scarry <k-1 >)+kilt<k-1 >* 
propagate<k-1 >*x);next in OM2, x is 

undefined 

If kill(i) and propagate(i) are both high, the carry chain does funny 

things. 

We represent that here by use of the "x" in the carry function. 
alu.out<0:1 5><-( 

result, control <3 > A(~»propagate<0:15>) A ("^carry <0:15>)V 
result.control<2>A( — ’propagate <0:15>) A carry <0:1 5> V 
result.controK 1 > A propagate <0:1 5> A(~'carry <0:1 5>)V 
result. control<0> Apropagate<0:1 5> Acarry<0:1 5>);next 
latch, alu. out. a=1 =>(alu.out.a«-alu.out);next 
latch.alu.out.b=1 =>(alu.out.b<-alu.out);next 
I iteral. contro I = 1 =>( literal <-bus.a.old);next 
latch.flags=1 =>( 

flags<5>«-flags<15>;next 

flags<6>«-carry< 15>;next 

flags < 10> <-al u.out <0>; next 

flags< 11 ><-0;next 

alu.out=0=>(flags< 11 >«-1 );next 

flags< 1 2><-alu.out< 15>;next 

flags < 14><-carry< 16>;next 

flags < 13><-flags< 14>©flags<6>;next 

flags<9>«-flags<12>©flags<13>;next 

flags<7>«-flags< 11 > Vflags<9>;next 

flags<8>«- _ '(flags< 14> Vflags<11 >);next);next);next 

) end of instruction execution 


Reference: 


1 . C. G. Bell, A. Newell, "Computer Structures: Readings and Examples", Chap. 2, 
McGraw-Hill, 1971, 
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Summary of Commands 


Transfer Phase: PHI 1 


Literal Bus B Source Bus B Destination Bus A Source Bus A Destination 

Control 


0M2 


Bus A 

Source 

Literal Control 

Bus B 

Source 

Onnnn 

Register n 

000 

Microcode In 

Onnnn 

Register n 

10000 

Right Port Pins 

001 

Illegal 

10000 

Right Port Pins 

10001 

Right Port Latch 

010 

Literal In 

10001 

Right Port Latch 

10010 

Left Port Pins 

01 1 

Illegal 

, 10010 

Left Port Pins 

1001 1 

Left Port Latch 

10O 

Execute old A Bus 

10011 

Left Port Latch 

10100 

ALU Output Latch A 

101 

Illegal 

10100 

ALU Output Latch A 

10101 

ALU Output Latch B 

1 10 

A Bus gets old A Bus 

10101 

ALU Output Latch B 

101 10 

Flag Register 

111 

Literal Out 

other 

No Source 


Literal (see Literal Control) 

[_ 




other 

No Source 


LSB of the Latching Field 






during last PHI 2 




Bus A 

Destination 

Bus B 

Destination 

Onnnn 

Register n 

OOnnnn 

Register n 

10000 

Left Port, drive now 

010000 

Left Port, drive now 

10001 

Left Port, drive PHI 2 

010001 

Left Port, drive PHI 2 

1001 x 

Left Port, no drive 

OlOOIx 

Left Port, no drive 

10100 

Right Port, drive now 

010100 

Right Port, drive now 

10101 

Right Port, drive PHI 2 

010101 

Right Port, drive PHI 2 

101 lx 

Right Port, no drive 

0101 lx 

Right Port, no dirve 

1 1000 

ALU Input Latch A 

01lOxx 

ALU Input Latch B 

1 1001 

ALU Input Latch A gets Shift Out 

011Ixx 

No Destination 

1 1010 

1 101 1 

ALU Input Latch A gets Shift Control 

Flag Register , 

1Onnnn 

ALU Input Latch B gets shift 
output, shift constants 

other 

Do Destination 

11nnnn 

ALU Input Latch B gets shift 


control, shift constants 


Operation Phase: PHI 2 


ALU Operation 


ALU Operation 


1000 

01 10 

0110 

00 

00 

Add 

1000 

01 10 

01 10 

00 

01 

Add with Carry 

0100 

1001 

01 10 

00 

10 

Subtract 

0010 

1001 

01 10 

00 

10 

Subtract Reversed 

0100 

1001 

01 10 

00 

01 

Subtract with Borrow 

0010 

1001 

01 10 

00 

01 

Subtract Reversed with Borrow 

001 1 

1100 

0110 

00 

10 

Negative A 

0101 

1010 

01 10 

00 

10 

Negative B 

1100 

0011 

01 10 

00 

10 

Increment A 

1010 

0101 

01 10 

00 

10 

Increment B 

001 1 

1100 

1001 

00 

10 

Decrement A 

0101 

1010 

1001 

00 

10 

Decrement B 

0000 

0001 

001 1 

00 

00 

Logical AND 

0000 

01 1 1 

001 1 

00 

00 

Logical OR 

0000 

01 10 

001 1 

00 

00 

Logical Exclusive Or 

0000 

1 100 

001 1 

00 

00 

Not A 

0000 

1010 

001 1 

00 

00 

Not B 

0000 

001 1 

001 1 

00 

00 

A 

0000 

0101 

001 1 

00 

00 

B 

1000 

0111 

01 1 1 

01 

00 

Multiply Step 

1 100 

1111 

1111 

10 

00 

Divide Step 

0000 

01 1 1 

001 1 

11 

00 

Conditional AND'OR 

0101 

1010 

0001 

00 

10 

Generate Mock 

uuuu 

uuuu 

uuuu 

uu 

uu 

1 

User Defined Op 





L_ 

Carry In Select Field 


Flag Select Carry In Latching 
Select Field 


Carry In Select 

00 o 
01 Flagbit 
10 1 

11 Flagbit Complimented 


Flag Select 

000 Old Flagbit 

001 Carry Out 

010 MSB 

011 Zero 

100 Less than flag 

101 Less than or equal flag 

110 Higher flag 

111 Overflow 


Latching Field 

1 xxx Latch Flags 

x 1 xx Load ALU Output Latch A 

xxlx Loud ALU Output Latch B 

xxxl Literal bits get old A Bus next PH! 1 

0000 Nop 


(figl 0 sil) 





Coding Form - 0M2 [Ch.s.. Sect2] 



1 sil) 
































